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demonstrates  that  computer-based  laboratories  can  help  students  learn  targeted 
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the  microeconomics  discovery  world  by  first-year  university  students,  and  compares 
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Inference  and  Discovery 
in  an  Exploratory  Laboratory 

Valerie  Shute 
Robert  Glaser 
Kalyani  Raghavan 


Introduction 

Formulating  and  testing  hypotheses  using  observations  and  empirical  findings  Is 
not  only  central  to  scientific  work,  but  also  to  the  acquisition  of  knowledge  in 
general.  As  new  Information  Is  obtained  and  hypotheses  are  Inferred,  they  serve  as  a 
basis  for  confirming  or  refuting  perceived  regularities  and  lawful  relationships.  In  the 
research  described  here,  we  employ  a  computer  laboratory,  which  we  call  an 
intelligent  discovery  world,  to  study  the  strategies  students  use  to  explore  this 
environment.  Our  interest  focuses  on  studying  Individual  differences  In  strategies  of 
inference  and  discovery,  Including  comparative  studies  of  successful  and  less 
successful  learners,  and  eventually  studies  of  tutorial  assistance  to  discovery  skills. 


The  central  problem  of  induction  and  hypothesis  formation  Is  to  carry  out 
cognitive  performances  that  ensure  that  Inferences  drawn  are  plausible  and  relevant 
to  the  world  or  system  being  observed.  The  plausibility  of  Inductions  and  stated 
hypotheses  can  be  determined  with  reference  to  knowledge  obtained  about  the 
system.  Thus  the  students'  process  of  inference  depends  on  the  application  of 
observation,  experimentation,  and  data  organization  that  enable  the  specification  and 
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testing  of  the  knowledge  obtained  through  experiments,  hypotheses,  and 
confirmations.  As  Holland,  Holyoak,  Nlsbett  and  Thagard  (1986)  wrote:  "The  study 
of  Induction,  then,  Is  the  study  of  how  knowledge  Is  modified  through  Its  use"  (p.  5). 

The  kind  of  learning  that  we  are  considering  has  a  reasonably  long  research 
history  in  experimental  psychology,  mostly  In  the  context  of  laboratory  and 
knowledge-lean  tasks.  In  recent  years,  research  has  taken  place  in  more  complex 
situations,  as  well  as  in  studies  of  machine  learning,  experimental  studies,  and 
computer  simulation  of  problem  solving  and  discovery  tasks  (Klahr  &  Dunbar,  1987; 
Kuhn  &  Phelps,  1982;  Langley,  Simon,  Bradshaw  &  Zytkow,  1987).  Still,  relatively 


little  work  has  investigated  the  domains  taught  in  schools  and  formal  education. 
Some  exceptions  are  studies  of  microworlds  in  physics  (Champagne  &  Klopfer,  1982; 
DiSessa,  1982;  White,  1983;  White  £  Horowitz,  1987). 


As  Indicated,  Inductive  problem  solving  information  can  be  present  in  the 


environment,  and  the  problem  solver  must  attempt  to  find  a  general  principle  or 
structure  that  is  consistent  with  this  Information.  Scientific  Induction  Is  an  Important 
example  or  this,  as  is  medical  and  technical  diagnosis  In  which  a  set  of  symptoms  Is 
presented  and  the  task  is  to  Induce  the  fault  or  cause.  To  paraphrase  Greeno  and 
Simon's  description: 

Solving  an  Induction  problem  can  proceed  in  two  ways,  and  in  most  tasks 

a  combination  of  the  methods  is  used.  A  top-down  method  Involves 

generating  hypotheses  about  the  structure  and  evaluating  them  with 

information  about  the  observed  instances.  A  bottom-up  method  Involves  « 

1 

storing  information  about  observations  and  events  and  making  Judgments  | 

about  new  events  on  the  basis  of  similarity  or  analogy  to  the  stored 
information.  To  perform  the  top-down  method,  the  problem  solver  requires 
a  procedure  that  generates  or  selects  hypotheses,  a  procedure  for  evaluating  * 

hypotheses,  and  then  a  way  of  using  the  hypothesis  generator  to  modify  or  i 

replace  hypotheses  that  are  found  to  be  incorrect.  To  use  the  bottom-up 
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method,  the  problem  solver  needs  a  method  of  extrapolating  from  stored 
information,  either  by  judging  similarity  of  new  stimuli  to  stimuli  stored  in 
memory  or  by  forming  analogical  correspondences  with  stored  information 
(1984,  p.  82). 

To  a  large  extent,  classic  studies  of  Induction  have  focused  on  Inducing  a  rule  or 
classifying  relatively  abstract  stimuli  Into  categories  on  the  basis  of  feedback  about 
classification  errors  and  other  information  (see  Pellegrino  &  Glaser,  1980;  Smith  & 
Medin,  1981).  Given  our  concern  for  exploratory  environments,  we  perceive  this  large 
literature  as  pertaining,  for  the  most  part,  to  passive  induction  in  which  the  learners 
induce  rules,  make  hypotheses,  and  classify  and  taxonomize  observations  on  the  basis 
of  sets  of  pre-determlned  Instances  designed  by  the  experimenter.  However,  a  more 
active  process  is  apparent  when  the  learner  can  select  variables,  design  Instances,  and 
interrogate  his  or  her  existing  knowledge  and  memory  for  recent  events.  In  the  latter 
form  of  induction,  we  need  a  research  paradigm  that  allows  us  to  examine  active 
experimentation  in  which  learners  explore  and  generate  new  data  and  test  hypotheses 
with  the  data  they  have  accumulated  in  the  course  of  their  investigations.  Recent 
experimental  technology  and  computer  modeling  have  made  this  type  of 
experimentation  feasible  (Bonar,  Cunningham  &  Schultz,  1980;  Michalski,  1980; 
Yazdanl,  1986). 

In  our  research  program,  we  have  been  investigating  the  learning  of  topics  in 
elementary  physics,  basic  electronics,  and  economics.  In  this  chapter,  we  report  on 
the  economics  world,  called  Smithtown.  The  environments  we  design  enable  us  to 
investigate  a  range  of  inductive  or  discovery  learning,  from  learning  in  purely 
discovery  environments  to  more  guided  discovery  worlds.  What  we  are  learning  from 
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our  work  is  that  as  students  explore  phenomena,  they  can  be  guided  and  coached  In 
the  interrogation  of  a  subject  matter,  analyzing  their  own  understandings  and 
misunderstandings,  assessing  progress  toward  their  goals,  and  revising  their  problem 
solving  and  learning  strategies. 

Our  exploratory  systems  are  designed  to  record,  structure,  and  play  back  to 
students  their  own  problem  solving  processes.  Such  systems  have  been  developed  in 
algebra  and  geometry,  where  they  provide  a  structured  "trace1  of  problem  solutions 
so  that  students  can  see  the  alternative  paths  that  they  have  tried  (Anderson,  Boyle, 
Farrell.  &  Reiser.  1984;  Brown.  1983).  Previous  papers  report  early  work  (Relmann, 
1986;  Shute  &  Glaser,  in  press)  and  this  paper  describes  an  Initial  study  of  Individual 
differences  in  exploration,  data  collection,  and  hypothesis  formation  In  an  exploratory 
world  of  microeconomic  laws. 

Smithtown  is  a  computer  program  that  provides  a  discovery  environment  for 
learning  elementary  microeconomics.  An  ideal  sequence  of  Iterative  behaviors  In 
Smithtown  would  include;  exploring  the  world  (informally),  developing  a  plan  for 
investigation  (more  formally),  choosing  on-line  tools  or  techniques  for  executing  the 
plan,  collecting  and  recording  data  from  the  experiment,  organizing  the  results,  seeing 
if  the  data  confirm  or  negate  prior  beliefs,  constructing  a  problem  representation, 
modifying  the  problem  based  on  discrepant  results,  refining  the  problem  based  on 
additional  information,  recognizing  discrepancies  between  the  result  and  expectations, 
testing  out  findings  In  additional  realms,  and  finally,  generalizing  a  principle  or  law. 


The  focus  of  the  study  we  will  be  discussing  is  on  students'  ■inductive  inquiry 
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skills,*  which  In  this  context  refers  to  the  students'  effectiveness  in  collecting, 
organizing,  and  understanding  data,  concepts,  and  relationships  in  a  new  domain. 
This  system  has  been  implemented  on  a  Xerox  1108  Lisp  machine,  allowing  self- 
paced,  individualized,  and  interactive  instruction  in  a  rich  data  source  (see  Shute  &. 
Glaser,  in  press,  for  an  overview  of  the  system). 

We  hypothesize  that  discovery  learning  can  contribute  to  a  rich  understanding 
of  domain  information  by  enabling  the  student  to  access  and  organize  information. 
Furthermore,  a  proposition  to  be  evaluated  in  this  work  is  that  effective  interrogative 
skills  are  teachable  if  the  particular  skills  involved  can  be  articulated  and  practiced 
under  circumstances  which  require  them  to  be  used. 

Intelligent  tutorial  guidance,  in  conjunction  with  a  discovery  world 

environment,  can  potentially  transform  a  student's  problem  solving  performance  into 

efficient  learning  procedures  rooted  in  an  individual's  own  actions  and  hypotheses.  In 

such  experiential  learning,  students  interact  with  new  subject-matter  situations, 

comparing  their  observations  with  their  current  beliefs  and  theories,  which  may  be 

rejected,  accepted,  modified,  or  replaced  (see  Glaser,  1984).  In  the  course  of  this 

developing  knowledge,  students  ask  questions,  make  predictions,  make  inferences,  and 

generate  hypotheses  about  why  certain  events  occur  with  systematic  regularity. 

Significant  experience  of  this  kind  in  discovering  principles  in  a  field  of  knowledge 

»  • 

should  alter  the  relation  learners  perceive  between  themselves  and  the  knowledge, 
and  their  way  of  behaving  when  they  forget  a  solution  procedure  or  encounter  an 
unprecedented  problem  (Cronbach,  1986). 
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We  report  on  the  results  of  an  empirical  study  conducted  using  Smlthtown. 
The  report  is  divided  into  five  sections:  Knowledge  Bases  or  ■Experts"  in 
Smlthtown,  Maneuvering  through  Smithtown  with  On-line  Tools,  Learning  and 
Individual  Differences,  the  Results,  and  a  General  Discussion. 

Knowledge  Bases  in  Smithtown 

The  primary  purpose  of  the  system  is  to  help  students  become  more  methodical 
and  scientific  in  learning  a  new  domain.  The  first  knowledge  base  or  "expert"  we 
will  discuss  deals  with  efficacious  inquiry  skills. 

The  First  Knowledge  Base:  Inductive  Inquiry  Skills 

An  earlier  study,  conducted  with  Smithtown,  yielded  information  about  more 
and  less  effective  behaviors  for  interrogating  and  inducing  information  from  a  new 
domain  (reported  in  Shute  &.  Glaser,  in  press).  This  information  was  subsequently 
coded  into  rules  that  the  system  monitors  in  conjunction  with  a  learner's  actual 
behaviors.  Thus,  the  system  knows  of  sequences  of  good  behaviors  and  also  sequences 
of  ineffective  or  "buggy"  behaviors. 

The  system  leaves  a  student  alone  if  s/he  Is  performing  adequately  in  the 
environment.  However,  if  the  system  determines  that  a  student  is  floundering  or 
demonstrating  buggy  behaviors,  the  Coach  will  intervene  and  offer  assistance  on  the 
specific  problematic  behavior(s).  For  instance,  if  a  student  persists  in  changing  many 
variables  at  one  time  without  first  collecting  baseline  data  into  the  on-line  notebook, 

the  rule  that  would  be  invoked  would  look  like  the  following  (paraphrased): 

If  -  The  student  changes  more  than  two  variables  at  a  time  prior  to 

collecting  baseline  data  for  a  given  market,  and  it  is  early  in  the 
session  where  the  experiment  number  is  less  than  four, 
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Then  -  Increment  the  “Multiple  Variable  Changes*  bug  count  by  l  and 

pass  the  list  to  the  Coach  for  possible  assistance. 


If  this  rule  does  get  fired  and  the  number  of  times  it  has  been  invoked  has 
surpassed  some  threshold  value  (e.g.,  four  times),  then  the  Coach  would  appear  and 
say. 

"I  see  that  you  're  changing  several  variables  at  the  same  time.  A  better 
strategy  would  be  to  enter  a  market,  see  what  the  data  look  like  before  any 
variables  have  been  .changed,  then  just  change  one  variable  while  holding 
all  the  others  constant.  * 


In  addition  to  the  rules  monitored  by  the  system,  we  developed  a  list  of 
performance  measures  or  “learning  indicators*  that  enable  us  to  determine  what  type 
of  actions  or  behaviors  yield  better  performance  in  this  type  or  environment.  A  range 
of  learning  indicators  was  created,  from  low-level,  simple  counts  of  actions  (e.g.,  total 
number  of  activities  taken  within  Smithtown)  to  higher-level,  complex  behaviors  (e.g., 
number  of  times  a  manipulation  to  an  independent  variable  was  made  that  showed 
an  obvious  change  in  the  dependent  variables).  These  indicators  will  be  discussed  in 
a  later  section  and  serve  as  one  data  source  for  our  study  on  individual  differences  In 


learning  in  Smithtown. 
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The  Second  Knowledge  Base:  Economic  Concepts  in  Smithtown 

The  second  knowledge  base  or  "expert*  In  the  system  knows  about  the 
functional  relationships  among  economic  variables  which  comprise  valid  economic 
concepts  and  laws.  The  system  has  a  defined  instructional  domain,  which  Is 
decomposed  Into  key  concepts  that  are  organized  In  a  bottom-up  manner  (l.e.,  from 
simpler  to  more  complex  ideas).  An  understanding  of  these  concepts  should  result 

from  the  student's  experiments  in  the  microworld.  The  hierarchy  of  domain 

knowledge  was  developed  by  first,  reviewing  six  introductory  microeconomics 
textbooks  and  determining  the  presentation  order  of  information  and  second, 

discussing  the  optimal  ordering  of  these  concepts  for  student  learning  in  the 

classroom  with  a  college  instructor  of  economics. 

Although  a  student  is  not  required  to  learn  the  concepts  in  any  prescribed 
order,  the  hierarchy  shown  in  Figure  1  provides  the  system  with  information  about 
where  the  student  is  likely  to  be  with  regard  to  his/her  knowledge  acquisition.  That 
is,  the  concept  of  ■equilibrium*  can  be  more  readily  understood  after  the  laws  of 
supply  and  demand  have  been  learned. 

For  the  reader  unfamiliar  with  this  domain,  we  will  now  describe  the  basic 
concepts  in  microeconomics  that  can  be  learned  using  Smithtown. 

Supply  and  Demand.  The  buyer's  side  of  the  market  is  called  demand.  The 
law  of  demand  states  that  the  quantity  of  a  product  which  consumers  would  be 
willing  and  able  to  purchase  during  some  period  of  time  is  Inversely  related  to  the 
price  of  the  product.  If  the  price  of  gasoline  goes  up,  consumers  will  demand  a 
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smaller  quantity  of  gasoline;  if  the  price  goes  down,  consumers  will  demand  larger 
quantities.  If  we  graph  this  relationship,  we  get  what  is  called  a  demand  curve  (see 
Figure  2)  showing  how  the  quantity  demanded  of  a  product  will  change  as  the  price 
of  that  product  changes,  holding  all  other  factors  constant. 

The  seller's  side  of  the  market  is  called  supply.  The  law  of  supply  is  that  the 
quantity  of  a  product  which  producers  would  be  willing  and  able  to  produce  and  sell 
Is  related  to  the  price  of  the  product  by  a  positive  function.  If  the  price  of  color 
televisions  goes  tip.  producers  will  tend  to  offer  more  television  sets  for  sale.  If  the 
price  of  color  television  sets  goes  down,  producers  will  reduce  the  number  of 
television  sets  they  put  on  the  market.  If  we  graph  this  relationship,  we  get  what  is 
called  a  supply  curve  (see  Figure  3).  A  supply  curve  is  a  graph  showing  how  the 
quantity  supplied  of  some  commodity  will  change  as  the  price  of  that  commodity 
changes,  holding  all  other  factors  constant. 

Equilibrium,  Surplus  and  Shortage.  There  are  many  factors  that  Influence 
the  price  of  a  given  product,  but  when  a  price  is  reached  where  the  quantity  that 
sellers  want  to  sell  Is  equal  to  the  quantity  that  buyers  want  to  buy,  we  say  that  the 
market  is  at  a  point  of  equilibrium  (see  Figure  4).  Competitive  markets  always  tend 
toward  points  of  equilibrium.  If  the  market  price  is  higher  than  the  equilibrium 
price,  buyers  will  demand  smaller  quantities  than  sellers  are  supplying.  This  will 
create  a  surplus.  Surpluses  of  unsold  goods  will  convince  sellers  to  lower  their  price 
down  toward  the  equilibrium  level.  If,  for  some  reason,  the  market  price  is  lower 
than  the  equilibrium  price,  buyers  will  demand  larger  quantities  than  sellers  are 
supplying,  thus  creating  a  shortage.  Shortages  will  lead  to  price  Increases,  and  the 
price  will  rise  toward  the  equilibrium  level. 
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Changes  in  Supply  and  Demand.  A  change  In  the  price  of  a  good  will 
Influence  the  quantities  demanded  and  supplied  and  cause  movement  along  a  fixed 
curve.  A  change  to  variables  other  than  price  will  cause  the  entire  curve  (demand  or 
supply)  to  shift,  depending  on  which  variable  is  changed  and  the  magnitude  of  the 
adjustment.  We  refer  to  the  variables  in  Smithtown  that  can  be  manipulated  as 
"town  factors,"  and  they  Include:  per  capita  income,  population,  Interest  rates, 
weather,  consumer  preferences,  labor  costs,  number  of  suppliers,  and  the  price  of 
substitute  and  complementary  goods.  For  instance,  if  the  population  of  Smithtown 
w  as  Increased  from  10,000  to  25,000  persons,  then  the  demand  for  automobiles  would 
increase,  resulting  in  a  shift  to  the  right  of  the  demand  curve  for  cars.  Alternatively, 
if  the  number  of  suppliers  of  a  particular  good  were  to  decrease,  this  would  affect  the 
supply  curve  for  that  commodity,  resulting  in  a  shift  to  the  left.  These  shifts  are 
depicted  In  figures  5  and  6. 

New  Equilibrium  Point.  Competitive  markets  tend  to  converge  toward 
equilibrium  points.  Equilibrium,  once  established,  can  be  disturbed  by  changes  in 
demand  and/or  supply.  If  demand  and/or  supply  change,  a  surplus  or  shortage  will 
result  at  the  original  price,  and  the  price  will  move  toward  a  new  equilibrium.  A 
shortage  at  the  original  price  will  cause  the  old  price  to  rise  to  the  new  level  and 
cause  changes  In  the  quantities  supplied  and  demanded.  A  new  equilibrium  will  be 
established  at  the  second  price  and  the  second  quantity  and  may  be  seen  in  Figure  7. 

In  addition  to  the  above  economic  concepts,  at  least  two  more  can  be  extracted 
from  the  discovery  world,  although  they  are  not  explicitly  recognized  by  the  system: 
cross  elasticity  of  demand  and  supply.  Cross  elasticity  of  demand  indicates  how  a 
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change  In  one  market  affects  the  demand  In  a  related  market  while  cross  elasticity  of 
supply  Indicates  how  a  change  In  one  market  affects  the  supply  in  a  related  market. 

Maneuvering  through  Smithtown  with  On-line  Tools 

Students  can  discover  regularities  in  the  market  by  manipulating  variables, 
observing  effects,  and  using  tools  to  organize  the  information  In  an  effective  way. 
The  on-line  tools  for  scientific  Investigations  in  Smithtown  include  a  notebook  for 
collecting  data,  a  table  to  organize  data  from  the  notebook,  a  graph  utility  to  plot 
data,  and  a  hypothesis  menu  to  formulate  relationships  among  variables.  Three 
history  windows  allow  the  students  to  see  a  chronological  listing  of  actions,  data,  and 
concepts  learned. 

First,  a  student  selects  a  market  to  investigate  from  the  "Goods  Menu"  and 
informs  the  system  of  his  or  her  experimental  intentions  by  choosing  variables  s/he  Is 
interested  in  from  the  "Planning  Menu."  For  each  new  experiment,  the  system  asks 
the  student  if  s/he  would  like  to  make  a  prediction  regarding  the  planned 
experiment.  If  the  student  chooses  "No,"  the  next  menu  of  options  Is  the  "Things  To 
Do  Menu."  If  the  student  responds  "Yes,"  a  window  appears  where  specific 
statements  can  be  entered  about  predicted  outcomes  to  a  planned  manipulation.  For 
example,  if  the  student's  experiment  was  to  increase  the  price  of  gasoline  in  order  to 
see  the  repercussions  in  the  market  place,  one  prediction  could  be:  f77ie  quantity 
demanded  (of  gasolinej  will  decrease . f  Explorations  and  experiments  are  directed 
from  the  "Things  To  Do  Menu"  where  they  are  provided  with  10  options.  Each 
option  is  described  below. 

1.  See  market  sales  information.  This  window  displays  information  on 
the  current  state  of  the  market. 
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2.  Computer  adjust  price.  The  computer  will  Increase  or  decrease  the 
price,  whichever  brings  the  current  market  closer  to  equilibrium. 

3.  Self  adjust  price.  Provides  the  student  with  an  on-line  calculator  and 
allows  the  price  of  the  particular  good  to  be  changed. 

4.  Make  a  notebook  entry.  The  student  selects  variables  to  record,  and 
the  current  values  are  automatically  put  into  the  notebook  (see  Figure  8). 

5.  Set  up  table.  The  table  package  allows  the  student  to  select  variables  of 
interest  from  the  notebook,  put  them  together  in  a  table,  and  sort  on  any 
selected  variable,  by  ascending  or  descending  order  (see  Fig. 9). 

6.  Set  up  graph.  The  graph  utility  allows  a  student  to  plot  data  collected 
from  his/her  explorations  and  experiments.  This  provides  an  alternative 
way  of  viewing  relations  between  variables  (see  Figure  10). 

7.  Make  a  hypothesis.  The  hypothesis  menu  allows  students  to  make 
inductions  or  generalizations  from  relationships  in  the  data  they  have 
collected  and  organized.  There  are  actually  four  interconnected  menus  of 
words  and  phrases  comprising  the  hypothesis  menu  (see  Figure  11).  First, 
the  "connector  menu"  Includes  the  Items:  if,  then  as,  when,  and,  and 
the.  Next,  the  "object  menu"  contains  the  economic  indicator  variables 
used  by  the  system.  The  "verb  menu"  describes  the  types  or  change,  like 
decreases,  increases,  shifts  as  a  result  of,  and  so  on.  Finally,  the 
"direct  object  menu"  allows  for  more  precise  specification  of  concepts 
such  as:  over  time,  along  the  demand  curve,  changes  other  than  price, 
etc.  As  students  combine  words  or  phrases  from  these  menus,  the 
resultant  statement  appears  in  a  window  below.  A  pattern  matcher 
analyzes  key  words  from  the  input  and  checks  whether  this  matches 
stored  relationships  for  each  targeted  concept.  For  instance,  if  the 
student  stated:  .4s  price  increases,  quantity  demanded  decreases,  the 
system  would  match  that  to  the  law  of  demand  which  it  understands  to 
be  the  Inverse  relationship  between  price  and  quantity  demanded. 

8.  Experimental  frameworks.  There  are  three  "experimental 
frameworks"  which  provide  the  student  with  easy  maneuvering  within 
and  between  experiments.  These  include:  Change  Good,  Same 
Varlable(s);  Same  Good,  Change  Variable(s);  and  Change  Good,  Change 
Varlable(s).  They  are  used  to  change  to  a  new  market  while  holding  the 
independent  variables  the  same,  change  town  factor(s)  within  the  current 
market,  or  to  change  the  town  factor(s)  and  the  market,  respectively. 

9.  History  Windows.  Three  history  windows  are  Included  in  the  system, 
accessible  by  both  the  students  and  the  system.  As  students  continue  to 


Shute,  Glaser,  Raghavan  13  February  1988 

interact  with  Smithtown,  histories  accumulate,  delineating  the  various 
actions  resulting  from  different  explorations  and  experiments.  This 
summary  is  maintained  in  the  Student  History  window.  The  Market 
History  window  keeps  a  record  of  all  variables  and  associated  values  that 
the  student  has  manipulated.  Finally,  there  is  the  Goal  History  window. 

This  provides  a  representation  of  what  the  student  has  successfully 
learned  in  terms  of  concepts  targeted  by  the  system. 


Learning  and  Individual  Differences 

In  this  section,  we  describe  an  exploratory  study  of  learning  and  individual 
differences  in  performance  in  this  Intelligent  discovery  world  environment.  The 
system  was  able  to  categorize  sequences  of  student  actions  as  being  more  or  less 
effective  and  intervened  with  a  hint  at  times  when  the  student  was  floundering. 

This  study  was  undertaken  with  two  main  goals  in  mind.  One  goal  was  to 
evaluate  Smithtown  to  see  If  individuals  Interacting  with  it  actually  acquired  any  of 
the  economic  concepts  embedded  In  the  environment  (e.g.,  the  law  or  demand, 
equilibrium  point,  and  so  on).  The  second  goal  was  to  determine  the  performance 
characteristics  of  those  Individuals  who  were  more  successful  In  learning  In  this  type 
of  environment  as  compared  to  those  less  successful.  Another  implicit  goal  was  to 
examine  the  computer  architecture  and  Interface  features  that  facilitated  or  overly 


constrained  an  exploratory  environment. 


3 


The  kind  of  inference-discovery  task  that  we  are  studying  has  been  interpreted 
within  a  problem  solving  framework  by  Klahr  and  Dunbar  (1987)  who  conceive  of  the 
interplay  between  hypothesis  formation  and  experimental  design  phases  of  the 
discovery  process  as  a  search  between  two  problem  spaces--a  hypothesis  space  of  rules 
and  an  experimental  space  of  instances.  'This  means  that,  first,  we  need  to  account 
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for  the  Identification  of  relevant  attributes,  for,  unlike  the  conventional  concept- 
formation  studies,  our  situation  does  not  present  the  subject  with  a  highly 


constrained  attribute  space  for  hypotheses.  Second,  we  need  a  more  complex 
treatment  of  the  instance  generator,  because  in  our  context  it  consists  of  an 
experiment,  its  predicted  outcome,  and  the  observation  of  the  actual  outcome*  (p.  8). 
Klahr  and  Dunbar  place  their  subjects  in  a  discovery  context  by  first  teaching  them 
how  to  use  an  electronic  device  (a  computer-controlled  robot  tank  called  "BlgTrak*) 
and  then  ask  them  to  discover  how  a  particular  function  works.  They  formulate  a 
general  model  of  scientific  discovery  as  dual  search  that  shows  how  search  In  the  two 
problem  spaces  shapes  hypothesis  generation,  experimental  design  and  the  evaluation 
of  hypothesis.  Strategy  differences  among  subjects  were  a  consequence  of  the 
efficiency  of  search  in  the  hypothesis  space.  Successful  subjects  were  classified  as 
theorists,  and  others  who  abandoned  hypothesis  testing  in  order  to  search  the 
experiment  space  were  classified  as  experimenters. 


In  our  Investigation  we  also  take  a  problem  solving  perspective  and  are  guided 
in  our  search  for  individual  differences  by  certain  general  findings  in  problem  solving 
performance.  For  example,  Sternberg  (1981)  makes  a  distinction  between  two  forms 


of  metacognitive  performance:  global  planning  and  local  planning.  Global  planning 
refers  to  a  strategy  that  applies  to  a  set  of  problems  and  does  not  focus  on  the 
characteristics  of  a  particular  problem;  global  planning  refers  attention  to  the  context 
or  overall  characteristics  of  the  group  of  problems.  Local  planning  refers  to  a 
strategy  that  is  sufficient  for  solving  a  particular  problem  within  a  given  set;  local 
planning  is  less  sensitive  to  general  context  and  focuses  more  on  the  difficulty  of 


Mr 


carrying  out  the  specific  operations  of  a  problem  solving  task.  Sternberg  finds  that 
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better  reasoners  spend  relatively  more  time  In  global  planning  of  a  strategy  for 
problem  solution  and. relatively  less  time  In  local  planning.  Such  a  distinction  Is  also 
evident  In  studies  of  expert-novice  problem  solving.  In  studies  of  writing,  Hay^s  and 
Flower  (1986)  point  out  that  experts  attend  more  to  global  problems  than  do  novices. 
Kxports  and  novices  attend  to  different  aspects  of  a  text.  Novices  focus  on  the 
conventions  and  rules  of  writing;  experts  make  more  changes  that  affect  the  text's 
meaning.  The  perceptions  of  the  novices  are  more  local  or  shallow,  and  those  of  the 
expert  more  global  and  overall  meaningful.  The  strategies  used  by  novices  are  local 
strategies  concerned  with  a  deletion  and  addition  of  words  and  phrases  whereas 
experienced  writers  are  concerned  more  with  strategies  that  Involve  changes  in 
content  and  structure.  In  physics  (Larkin,  McDermott,  Simon  £  Simon,  1980;  Simon 
£  Simon,  1978).  differences  in  problem  solving  between  novices  and  experts  also 
relate  to  surface  and  deep  problem  representations.  The  novice's  representation  of  a 
problem  results  In  a  local  rorm  of  problem  solving  in  which  they  work  with  equations 
to  solve  the  unknown.  Experts,  in  contrast,  work  in  a  more  top  down  manner 
indicating  that  a  general  solution  plan  Is  In  place  before  they  begin  the  manipulation 
of  specific  equations. 

The  above  findings  direct  our  attention  to  conceivable  differences  between  good 
and  poor  Inductive  problem  solvers  In  terms  of  the  global  and  local  aspects  of  their 
performance  or  their  attention  to  specific  versus  more  general  features  of  the  problem 
solving  task.  In  a  discovery  situation,  taking  a  lead  from  Klahr  and  Dunbar,  we 
translate  this  distinction  to  data-driven  performance  in  contrast  to  behavior  which  is 
more  rule  or  hypothesis-  driven.  In  our  task,  an  individual  starts  out  with  attention 
to  computer-generated  observations  and/or  to  subject-designed  experiments.  On  the 
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basis  of  these  data,  he  or  she  Induces  generalizations  or  hypotheses  which  drive  the 
further  data  collection,  data  organization  and  experimentation.  Based  on  the 
problem  solving  literature  described  above,  we  can  anticipate  that  good  reasoners 
might  display  rule-driven  performance  earlier  in  their  discovery  activity,  and  use 
rules  as  a  performance  goal  in  contrast  to  more  sustained  attention  to  data  collection, 
although  the  latter  is  necessary  at  certain  points  in  the  course  of  discovery. 

Furthermore,  in  addition  to  behaviors  at  a  general  level,  we  must  also  look  at 
more  direct  performance  components.  We  refer  to  specific  performance  heuristics 
manifested  by  good  reasoners  that  may  not  be  available  to  others.  A  good  example 
in  discovery  performance  is  the  heuristic  of  identifying  one  variable  as  a  dimension  of 
examination  and  holding  all  other  variables  constant  while  the  chosen  one  is  varied 
systematically.  Lawler  (1982),  In  discussing  computer  based  microworlds  that  use 
logo  language,  refers  to  this  as  variable-stepping.  He  points  out  that  Piaget  judged 
variable-stepping  to  be  an  essential  compound  of  formal  operational  thought-a 
powerful  idea  because  It  Is  universally  useful  and  crucial  to  the  process  of  scientific 
investigation.  In  this  regard  we  look  for  Individual  differences  in  our  discovery 
worlds  that  relate  to  such  performance  procedures. 

As  a  general  caveat  in  the  work  reported  here,  it  is  Important  to  point  out  that 
scientific  discovery  involves  a  whole  array  of  processes  including  observing  and 


gathering  data,  finding  regularities  that  describe  the  data,  formulating  and  testing 
the  generallzabillty  and  limitations  of  these  regularities,  and  formulating  and  testing 
explanatory  theories.  In  this  study  we  are  primarily  concerned  with  a  subset  of  these 


processes,  principally  with  discovery  that  starts  with  a  dataset  that  can  be 
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Investigated  and  that  derives  descriptive  rules,  laws,  or  regularities  from  them.  As 
has  been  pointed  out  (Bradshaw,  Langley  &  Simon,  1983)  "the  generation  of  data, 
and  even  the  invention  of  instruments  to  produce  new  kinds  of  data  are  also 
important  aspects  of  scientific  discovery.  And  in  many  cases,  existing  theory,  as  well 
as  data,  steer  the  course  of  discovery*  (p.  971).  In  this  chapter,  we  consider  the  path 
from  data  to  descriptive  laws  about  data  (not  necessarily  explanatory  theories).  This 
subset  of  scientific  work  is  important  in  discovery  and  in  our  concern  with  individual 
differences  in  induction  from  data,  and  the  process  by  which  inductive  discovery  is 
carried  out.  Also  to  be  kept  in  mind  is  the  fact  that  data-driven  induction  is  not 
completely  "pure."  Individuals  come  with  previous  conceptions  of  regularities  in  the 
data  and  they  manipulate  data  and  experiment  on  the  basis  of  hypotheses  they 
generate.  So.  the  discovery  process  that  we  study  here  will  involve  some  combination 
of  data-driven  induction  and  hypotheses-generated  data  which  guide  performance. 

Subjects.  Three  groups  of  subjects  were  involved  in  the  experiment  and 
consisted  of  the  following:  (1)  Students  who  received  traditional  classroom 
instruction  in  introductory  economics,  (2)  A  control  group  which  received  no 
economics  instruction,  and  (3)  Students  Interacting  with  Smithtown.  There  were  ten 
subjects  in  each  group.  All  subjects  were  from  the  University  of  Pittsburgh  and  none 
had  any  formal  economics  training  or  previous  economics  courses.  The  economics 
group  were  students  who  volunteered  to  participate  In  an  experiment  and  who  were 
enrolled  in  an  introductory  microeconomics  course.  About  half  of  the  control  group 
consisted  of  psychology  students  who  took  the  tests  for  class  credit:  the  other  half 
consisted  of  students  selected  from  those  who  responded  to  ads  placed  around  the 
campus  for  subjects  who  had  no  economics  background.  They  took  the  tests  and 
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received  a  small  payment.  The  experimental  group  were  individuals  who  similarly 
responded  to  ads  placed  around  the  University  of  Pittsburgh  campus.  They  were 
paid  for  their  participation.  It  should  be  noted  that  the  chapters  covered  by  the 
economics  class  during  the  testing  interval  corresponded  to  the  identical 
material/curriculum  covered  by  Smithtown  (l.e.,  the  same  introductory  economic 
principles  involving  the  laws  of  supply  and  demand  in  a  competitive  market).  All 
subjects  were  debriefed  about  the  purpose  of  the  experiment  at  its  conclusion. 
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Test  Materials.  The  test  battery  on  microeconomics  was  developed  by  an 
economics  instructor  at  the  University  of  Pittsburgh.  The  tests  were  initially  piloted 
by  individuals  who  provided  feedback  about  the  tests  in  terms  of  the  clarity  of 
instructions,  the  timing  or  the  tests,  and  the  general  level  of  dirriculty.  The  battery 
consisted  of  two  tests,  multiple  choice  and  short  answer.  After  test  development,  the 
batteries  were  reviewed  by  an  independent  economics  Instructor  for  content  validity 
(i.e.,  completeness  and  accuracy). 
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1.  MULTIPLE  CHOICE  TEST:  Two  alternate  forms  were  created  for  the  pre- 
and  post-test.  This  involved  knowledge  of  various  concepts  and  principles  of 
microeconomics.  Subjects  had  to  circle  the  best  answer  from  the  four  alternatives 
given.  An  example  of  a  pre-test  item  from  the  test  is: 


1 


The  supply  curve  of  houses  would  probably  shift  to  the  left  (decrease)  if: 

(a)  construction  workers'  wages  Increased 

(b)  cheaper  methods  of  prefabrication  were  developed 

(c)  the  demand  for  houses  showed  a  marked  increase 


3 


1 
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(d)  the  population  increased 

A  corresponding  post-test  item  was  constructed  for  each  of  the  pre-test  items. 
The  counterpart  to  the  above  question  is: 

Which  of  the  following  is  likely  to  move  a  supply  curve  for  beef  to  the  right  (an 
increase)? 

(a)  a  rise  in  the  price  of  beef 

(b)  a  decrease  in  the  price  of  cattle  feed 

(c)  an  increase  in  the  wages  of  farm  laborers 

(d)  a  decrease  in  the  price  of  raw  hides 

2.  SHORT  ANSWER  TEST:  This  test  involved  the  same  concepts  to  be 
defined  by  the  subject  for  both  the  pre-  and  the  post-tests.  It  required  elaborated 
knowledge  In  terms  of  defining  different  concepts,  coming  up  with  instances  of  a 
given  concept,  or  drawing  a  curve  on  a  labelled  but  empty  grid.  Two  examples  from 
the  short  answer  test  include: 

(a)  What  is  market  equilibrium? 

(b)  List  as  many  Important  factors  as  you  can  causing  the  demand  curve  for  a 
good  or  service  to  shift  over  to  the  left  or  right. 

Each  answer  on  the  short  answ-er  test  was  scored  with  reference  to  a  list  of 
necessary  and  sufficient  elements. 

Procedv  ;s.  Subjects  from  the  economics  group  were  administered  a  pre-test 
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battery  in  their  class  prior  to  the  lectures  and  readings  on  the  laws  of  supply  and 
demand.  They  received  about  two  and  one  half  weeks  of  instruction  on  this  part  of 
the  curriculum;  they  were  then  re-tested  in  the  classroom  with  the  post-test  battery. 

The  control  group  completed  the  pre-test  battery  and  then  returned  in  about 
two  weeks  for  the  post-tests.  This  Interval  corresponded  to  the  pre-  to  post-test 
Intervals  for  the  other  two  groups. 

The  experimental  group  took  the  pre-test  battery  Individually,  then  signed  up 
for  three  additional  two-hour  sessions.  This  translated  to  a  total  of  five  hours  on  the 
computer  (Session  1  =  pre-test  battery  plus  demonstration  of  the  system,  Session  2 
=  2  hours  on  the  computer,  Session  3  =  2  hours  on  the  computer,  and  Session  4  =  1 
hour  on  the  computer  and  1  hour  for  the  post-test  battery).  The  sessions  were  spread 
out  over  a  two  week  period  to  correspond  to  the  same  time  frame  as  the  economics 
group  and  the  control  group.  Prior  to  the  first  real  session  with  the  system,  students 
were  given  a  Guide  to  Smithtown  In  Session  1.  This  Informed  them  of  their  goal 
(i.e.,  to  discover  principles  and  laws  of  economics)  and  how  to  best  achieve  that  goal 
(i.e..  to  imagine  themselves  as  scientists,  gathering  data  and  forming  and  testing 
hypotheses  about  emerging  economic  principles  and  laws).  The  Guide  overviewed 
some  of  the  on-line  tools  available  in  Smithtown  with  examples  provided  on  how  to 
use  them.  Finally,  the  Guide  emphasized  that  the  Individual  would  probably  make 
errors  or  get  stuck,  but  to  try  to  learn  from  the  mistakes.  A  glossary  of  terms 
concluded  the  Guide  and  the  students  were  free  to  take  it  home  with  them  between 
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Results 

The  first  question  addressed  whether  the  three  groups  were  initially  comparable 
on  their  pre-test  battery  scores  (i.e..  multiple  choice,  MC  and  short  answer,  SA). 
Table  l  shows  the  summary  statistics  for  the  raw  data  while  the  mean  percentage 
scores  for  the  pre-test  battery  and  for  the  post-  test  battery,  collapsed  across  MC  and 
SA,  are  plotted  in  Figure  12. 

As  can  be  seen  in  Table  1  and  in  Figure  12  ,  the  three  groups  are  initially 
comparable,  while  on  the  post-test,  both  the  economics  group  and  the  experimental 
group  surpass  the  control  group.  First  we  computed  an  ANOVA  (repeated  measures 
design  where  the  grouping  factor  was  treatment  group  and  the  trial  factors  were:  test 
type  and  pre-  versus  post-test  condition).  The  most  important  Interaction  that  we 
were  interested  in  was:  pre/post  tests  by  treatment  group,  collapsed  across  tests,  F 
2.27  =  2.99;  p  =  .067.  This  shows  that  the  three  groups  did  differ  in  terms  of  their 
pre  to  post-test  changes  in  scores.  We  then  computed  a  Hotelling's  T2  test, 
contrasting  all  three  pairwise  combinations  of  groups  on  the  pre-test  battery,  yielding 

the  following  nonsignificant  T2  values: 

Economics  x  Control  group:  T2=  0.03  p=0.77 

Control  x  Experimental  group:  T2=  0.11  p=0.42 

Economics  x  Experimental  group:  T2=  0.03  p=0.80 

After  their  respective  interventions,  the  groups  differed,  however  the  economics 
group  and  the  experimental  group  ended  up  with  equivalent  post-test  scores.  It  is 
important  to  note  that  students  in  the  experimental  group  spent  only  five  hours 
Interacting  with  the  discovery  world  compared  to  2.5  weeks  (or  about  11  hours)  of 
classroom  lectures  and  recitation  covering  identical  curricular  information. 
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Hotelling's  T2  analysis  allows  us  to  see  particular  differences  between 
independent  groups  on  their  test  scores.  The  mean  vectors  for  each  group  can  be 
extracted  from  the  summary  statistics,  above.  First,  a  comparison  between  the 
economics  students  and  the  control  group  was  made  on  their  post-test  scores:  T2  = 
1.02;  p=  .003.  Thus,  as  expected,  these  two  groups  differed  overall  in  their  test 
scores.  Individual  t-tests  on  the  data  showed  that  the  difference  is  primarily 
associated  with  the  responses  on  the  short  answer  post-test.  The  economics  students 
had  much  more  complete  and  articulate  responses  than  the  control  group  (t  =  4.28; 
p  —  .0005).  Second,  the  results  from  this  analysis  revealed  that  the  economics  group 
and  the  experimental  group  performed  the  same  not  only  on  their  pre-test  scores,  but 
on  their  post-test  scores  as  well.  T2=  0.031;  p=.774.  The  experimental  group,  with 
significantly  less  time  on  task,  performed  comparably  with  the  students  in  the 
traditional  classroom  environment.  No  differences  were  found  between  any  of  the 
individual  tests.  Third,  the  control  and  the  experimental  groups  were  compared.  It 
was  expected  that  there  would  be  a  difference  between  these  two  groups  in  their  test 
composites  given  the  experimental  groups'  Interaction  with  the  system.  This 
comparison  also  showed  a  significant  difference  between  the  post-tests:  T2  =  1.24; 
p=  .001 .  Individual  t-tests  were  generated  for  each  of  the  tests,  and  the  short  answer 
post-test,  again,  was  the  major  reason  for  the  differences  (t  =  4.25;  p  =  .0005).  The 
experimental  group  had  much  more  complete  responses  than  the  control  group. 
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Individual  Differences  in  the  Experimental  Group 

The  results  from  the  between  group  analyses  suggest  that  overall,  Smlthtown 
was  effective  In  teaching  a  targeted  set  of  microeconomic  concepts  comparable  to  a 
traditional  classroom  environment.  We  now  further  examined  the  experimental 
group  data  to  see  how  differential  interaction  with  this  exploratory  world  affected 
subsequent  learning.  In  other  words,  some  individuals  learned  more  than  others  from 
the  system,  and  we  wanted  to  know  what  It  was  that  the  more  successful  Individuals 
did  in  comparison  to  the  less  successful  persons  in  extracting  and  understanding  new 
knowledge.  "Successful,"  in  this  context,  Is  someone  who  started  out  with  a  low  pre¬ 
test  score  on  the  battery  of  economics  tests  and,  after  interacting  with  the  system, 
ended  up  with  a  high  post-test  score.  Thus,  the  two  Interesting  comparisons  are 
between  those  scoring:  (l)  Low  on  the  pre-test  and  low  on  the  post-test,  and  (2)  Low 
on  the  pre-test  but  high  on  the  post-test.  We  were  not  Interested  In  those  who  scored 
high  on  both  the  pre-  and  the  post-tests  as  they  seemed  to  have  started  out  with 
some  domain-related  knowledge.  Table  2  shows  each  of  the  ten  experimental  subjects 
with  their  associated  pre-  and  post-test  scores  (percent  correct). 

Our  interest  is  in  comparing  individuals  who  scored  above  the  mean  gain  score 
and  below  it.  Thus,  there  is  a  pool  of  five  subjects  having  large  gains  and  five 
subjects  with  small  gains.  These  subjects  will  be  discussed  after  the  presentation  of 
the -learning  indicators. 

Table  3  is  a  listing  of  the  performance  measures  or  learning  indicators  that  were 
computed  for  each  individual  across  sessions.  For  this  exploratory  study,  we 
collapsed  data  from  the  sessions  into  a  single  index  for  each  Indicator,  although 
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changes  over  time  will  be  informative  to  look  at  In  the  future.  Two  data  sources 
were  used  In  computing  these  values:  (1)  Detailed  computer  history  lists  of  all 
student  actions,  and  (2)  Verbal  protocols  from  each  student  about  Justifications  for 
each  action,  what  they  expected  to  see  after  a  particular  action,  and  what  their  plans 
were  for  further  experimentation. 

Comparison  of  Subjects.  BW,  CF,  HT  and  OY  all  began  the  experiment  at 
about  the  same  level  of  knowledge,  measured  by  pre-test  scores,  but  after  the  sessions 
with  Smithtown,  subjects  BW  and  CF  (more  successful)  greatly  surpassed  subjects 
HT  and  OY  (less  successful)  on  the  post-test  battery.  In  terms  of  gain  scores  (i.e. , 
post-test  score  minus  pre-test  score),  BW  and  CF  scored  over  one  standard  deviation 
above  the  average  gain  score  while  HT  and  OY  scored  about  one  standard  deviation 
below  it. 

Pre-test  Post- test 

BW  -+  CF  47.0  80.7 

HT  +  OY  47.4  63.1 

The  question  reduces  to:  What  did  BW  and  CF  do,  in  terms  of  the  Indicators, 
that  HT  and  OY  did  not  do?  Table  4  shows  standardized  scores  for  these  two  pairs 
of  subjects. 

The  largest  differences  (ordered)  between  these  two  groups  are  for  the  following 
ten  indicators:  22,  6,  24,  29,  9,  20,  16,  23,  28,  and  13.  The  difference  scores  for  all  of 
these  indicators  exceeds  .90  standardized  units. 

The  first  observation  is  that  the  majority  of  these  indicators  are  from  the  most 
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cognitively  complex  set  of  behaviors  delineated,  i.e.,  those  In  the  Thinking  and 
Planning  category  with  six  of  the  difference  scores  greater  than  .90.  Next,  there  are 
three  main  differences  between  the  two  groups  in  the  Data  Management  category. 
Finally,  only  one  significant  difference  score  is  from  the  Activity /Exploration 
category.  The  progression  of  behaviors  across  these  three  categories  goes  from  simply 
being  active  in  the  environment  (Activity /Exploration),  to  efficient  (Data 
Management)  to  finally,  effective  (Thinking  and  Planning). 

We  will  now  discuss  each  of  these  ten  indicators  in  turn  as  far  as  their  relation 
to  individual  differences  in  performing  in  this  type  of  environment.  The  between 
subjects'  differences  will  be  Illustrated  in  each  of  the  three  relevant  categories  with 
excerpts  from  their  verbal  protocols  and  student  procedure  graphs,  developed  to 
depict  student  solution  paths. 

Thinking  and  Planning  Discriminating  Indicators 

This  category  represents  the  more  complex  learning  indicators  relating  to 
experimental  behaviors.  First,  the  data  show  that  the  subcategory  of  effective 
generalizations  was  a  very  good  discriminator  between  these  subjects.  Overall,  BW 
and  CF  attempted  to  generalize  findings  across  markets  (indicators  22  and  23)  to  see 
if  developing  beliefs  extended  beyond  the  current  market.  This  included  both 
generalizing  to  related  markets  (e.g,  investigating  the  effects  of  a  manipulation  on 
substitute  or  complementary  goods)  or  testing  beliefs  out  in  unrelated  markets  to  see 
the  limits  and  extent  of  a  particular  concept.  To  illustrate,  BW  (more  successful)  was 
careful  to  try  out  his  developing  ideas  in  different  markets  to  test  his  hypotheses.  In 
the  first  session,  he  was  investigating  the  tea  market,  testing  the  idea  that  increasing 
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the  population  caused  an  Increase  In  the  quantity  demanded  (It  actually  shifts  the 

demand  curve).  He  increased  the  population  and  then  said: 

BW:  Well,  the  quantity  demanded  did  go  up,  it  was  2550  last  time, 
although  I  would  have  thought  it  would  have  gone  up  more,  twice  as  many 
people  drinking  tea  [he  had  doubled  the  population ].  So,  quantity 
demanded  did  go  up.  There  was  a  bit  of  a  shortage.  Well,  I'd  be  pretty 
sure  that  it  [shows  the  relationship  between  population  and  quantity 
demanded],  .  .  I  think  it  would,  but  since  I  haven’t  tested  it  out,  I  can’t 
really  say.  /  would  change  the  good  to  take  care  of  that  problem. 


Since  some  of  the  town  factors  have  global  effects  and  some  have  limited 

effects,  It  is  a  good  strategy  to  try  out  things  In  different  markets.  After  looking  at 

the  effects  of  interest  rates  on  the  compact  car  market,  then  switching  to  the  donut 

market  to  see  if  interest  affected  anything  there,  BW  concluded: 

BW:  OK,  so  I  guess  interest  rates  only  influence  expensive  things  like 
compact  cars  or  big  cars,  but  not  donuts  or  hamburger  buns.  I  bet  there 
are  things  that  influence  everything,  like  income  influences  everything. 


In  contrast,  subjects  from  the  less  successful  group  never  generalized  a  concept 
across  markets  (related  or  unrelated  goods).  For  any  given  market,  they  would  make 
a  hypothesis  from  the  current  data  set  and  presume  that  It  held  across  all  goods, 
without  actually  testing  that  notion  out.  In  fact,  due  to  the  way  the  Hypothesis 
Menu  was  implemented  In  this  version  of  Smithtown,  it  was  possible  to  state  a 
number  of  correct  hypotheses  from  a  single  market,  yet  that  is  not  good  scientific 
behavior. 


The  next  indicator  that  differentiated  the  two  groups  had  to  do  with  using  the 
Planning  Menu  to  set  up  an  experiment,  specifying  variables  to  investigate,  and 
actually  conducting  an  experiment  based  on  those  stated  variable  manipulations 
(indicator  20).  Sternberg  (1981,  1985)  discusses  two  metacomponents,  global  planning 
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and  local  planning,  isolated  from  a  complex  reasoning  task.  In  a  study  of  planning 
behavior  in  problem  solving,  he  found  that  more  Intelligent  persons  scoring  high  on 
reasoning  tests  tended  to  spend  relatively  more  time  than  low  scoring  persons  on 
global  (higher-order)  planning  and  relatively  less  time  on  local  (lower-order)  planning. 
Poorer  reasoners,  however,  seemed  to  emphasize  local  rather  than  global  planning 
relative  to  the  better  reasoners.  Similarly,  Anderson  (1987)  Investigated  individual 
differences  in  students'  solutions  to  Lisp  programming  problems  and  found  that  the 
poorer  students  tended  to  be  less  planful  in  their  problem  solving  activities.  These 
findings  are  similar  to  our  study  in  that  the  individuals  who  do  engage  in  planning  an 
experiment  are  more  successful  (measured  by  our  gain  scores  criterion)  than  those 
who  do  not.  To  illustrate,  CF  (more  successful)  decided  to  test  the  affects  of 
Weather  on  the  demand  Tor  icecream  (where  Weather  can  range  from  l  -  cold  and 
wet,  to  10  --  warm  and  dry).  From  the  Planning  Menu  she  chose  the  variables  to 
investigate:  price,  quantity  demanded,  quantity  supplied,  surplus,  shortage  and 
weather.  After  changing  the  weather  Index  from  a  medium,  default  value  of  5  to  10, 
she  said,  "OK,  then  that  means,  I  think,  there  should  be  an  increased  demand  for 
icecream.  ’  She  collected  and  recorded  the  data,  observed  that,  Indeed,  the  quantity 
demanded  of  icecream  went  up,  and  chose  the  framework:  Same  Good,  Change 
Independent  Variable  so  that  she  could  stay  in  the  Icecream  market  and  manipulate 
the  weather  variable  further.  From  the  new  Planning  Menu,  she  selected  the  same 
variables  as  before,  then  changed  the  weather,  */’m  gonna  make  the  weather  really 
bad.  I'll  put  it  at  1.  .  .  I  think  there'll  be  a  surplus  now,  at  the  other  extreme.  ”  This 
prediction  was  confirmed  by  her  data.  The  other  two  subjects  that  were  less 
successful  evidenced  much  less  front  end  (higher-order)  planning  of  an  experiment 
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and  typically  only  selected  a  few  (or  Irrelevant)  variables  from  the  Planning  Menu. 
Often,  changing  an  independent  variable  has  effects  on  certain  other  variables,  and 
those  should  be  focused  on  in  a  given  experiment.  That  is,  if  the  population  were 
increased,  that  could  have  an  effect  on  the  demand  of  a  good,  and  in  the  long  run,  on 
the  price  of  that  good. 

The  next  discriminating  indicator  (indicator  29)  reflects  the  richness  and 
tenacity  of  an  individual's  actions  within  an  experiment,  as  measured  by  the  average 
number  of  actions  taken  per  experimental  episode.  A  thorough,  systematic 
investigation  of  a  concept  Is  indicated  by  more  connected  actions  within  an 
experiment  while  more  aimless  behavior  Is  seen  by  fewer  connected  actions.  If  a 
person  were  to  move  around  randomly  In  this  environment,  making  changes,  moving 
on  to  new  things,  and  so  on.  with  little  or  no  thread  of  consistency,  then  each 
experiment  would  have  a  small  number  of  actions  taken  within  a  given  market. 
Subjects  D\V  and  CF  were  not  random  movers.  Their  method  of  investigation  was  to 
choose  a  market  and  do  many  things  within  that  market,  always  observing  the  effects 
of  their  manipulations,  and  recording  them  in  the  on-line  notebook.  Thus,  the 
average  number  of  actions  within  their  experiments  was  much  greater  than  for 
subjects  HT  and  OY.  In  addition,  across  the  three  sessions,  the  more  successful 
subjects'  number  of  actions  per  experiment  increased,  showing  that  their  experiments 
became  more  complex  as  they  gained  additional  domain  knowledge.  The  less 
successful  subjects  did  not  demonstrate  a  similar  increase  In  complexity  of 
experiments  over  time;  rather,  their  average  number  of  actions  went  up  and  down 


across  sessions. 
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Relevant  to  these  results,  Sternberg  and  Davidson  (1982,  see  also  Sternberg, 
1985)  looked  at  Individual  differences  in  the  solution  to  insight  problems  where 
individuals  were  free  to  spend  as  long  as  they  liked  in  the  solution  process.  They 
computed  a  correlation  of  .62  between  the  time  spent  and  the  score  on  the  insight 
problems;  thus,  persistence  and  involvement  in  the  problems  was  highly  correlated 
with  success  in  solution.  They  argue  that  more  intelligent  persons  do  not  give  up, 
nor  fall  for  the  obvious,  often  incorrect,  solutions. 


This  activity  is  captured  in  "student  procedure  grapns"  that  we  constructed  for 
subjects  based  on  the  idea  of  the  problem  behavior  graphs  of  Newell  and  Simon 
(1972)  showing  student  actions  and  the  resulting  state  of  knowledge.  A  state  of 
knowledge  is  represented  by  a  node  and  the  application  of  an  operator  is  represented 
by  an  arrow  pointing  to  the  right.  The  result  of  the  operation  is  the  node  at  the 
head  of  the  arrow.  Vertical  lines  connecting  nodes  indicate  a  return  to  a  previous 
state  of  knowledge  because  no  new  Information  was  supplied.  The  operators  and  their 
symbols  used  for  our  purposes  are  listed  below.  Each  operator  is  recorded  above  or 
below  a  horizontal  arrow;  an  operator  below  the  arrow  Indicates  that  the  variable 
was  changed  back  to  Its  original  default  or  baseline  value.  Most  of  the  nodes 


(rectangles)  contain  symbols  representing  the  resulting  operation,  also  listed  below. 


Operators  &  Variables 


Operations 


P  -  Price 
G  -  Good 
H  -  Hypothesis 

FD  -  Town  factor  (demand  shifts) 
FS  -  Town  factor  (supply  shifts) 
GR  -  Graph 
T  -  Table 


R  -  Notebook  Recording 
S  -  Supply  Curve 
D  -  Demand  Curve 
/  -  Superimposed  curves 
(e  g.,  S/D) 

X  -  Error 
X  -  Error 
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Learning  Goals,  or  economic  concepts  that  can  be  discovered  are  indicated  by 
symbols  beginning  with  the  letter  "L"  followed  by  a  number  (e  g.,  the  law  of  demand 
-  L5).  Their  meaning  can  be  seen  in  Figure  1.  Figures  13  and  14  are  two  examples  of 
how  the  student  procedure  graphs  are  used  to  visually  illustrate  the  flow  of  problem 
solving  activity  In  more  and  less  efficient  individuals  (in  relation  to  indicator  29). 


Figures  13  and  14  exhibit  obvious  differences  in  experimental  behavior  using 
data  from  BW  and  another  subject  showing  below  average  gain  (subject  SS)  whose 
performance  well  Illustrates  the  contrast  between  focused  and  fragmented  search. 
The  horizontal  movement  depicted  in  the  graph  of  BW's  performance  (see  Figure  13) 
shows  much  more  focused  and  connected  persistent  behavior  than  the  vertical,  less 
relevant  movement  In  SS’s  experimental  behavior,  as  seen  in  Figure  14.  In  Figure  13, 
BW  (more  successful)  began  investigating  the  large  car  market  by  collecting  data  for 
the  market  when  Income  was  $20,000.  At  nodes  94  through  97,  he  changed  the 
average  Income  to  $30,000  and  collected  additional  data  by  changing  price  three 
times.  Next,  he  plotted  a  demand  curve  (98)  with  income  at  $20,000  and  then  at 
$30,000,  and  at  node  99,  he  made  a  hypothesis  that  when  Income  increased,  quantity 
demanded  increased,  and  the  demand  curve  would  move  to  the  right.  During  the 
period  from  nodes  loo  to  102,  he  had  the  computer  adjust  the  price  back  to 
equilibrium.  From  103  to  106,  he  changed  Income  $40,000  and  again  had  the 
computer  adjust  the  price  back  to  equilibrium.  The  subject  said,  *//’«  only  if  you 
change  something  other  than  price  that  you  get  a  new  demand  curve.  *  Finally,  at 
node  107  he  hypothesized  correctly  that  demand  curves  shift  as  a  result  of  changes 
other  than  price  (i.e.,  one  characterization  or  description  of  what  causes  demand 
curves  to  shift). 
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Contrasting  with  the  systematic,  persistent  performance  evidenced  by  BW, 
subject  SS  (less  successful)  spent  a  considerable  amount  of  time  generating  hypotheses 
that  were  unrelated  to  the  current  experiment.  Although  both  subjects  were 
attempting  to  characterize  a  demand  shift,  Figure  14  and  the  following  summary  of 
actions  clearly  demonstrate  an  ineffective  experimental  procedure. 

At  node  73.  SS  entered  the  market  for  gasoline  and  from  nodes  74  to  75, 
changed  the  price  from  $1. 18/gallon  to  $1. 00/gallon  and  then  down  to  $0. 75/gallon. 
At  (76),  she  hypothesized  that  "as  the  price  of  complementary  goods  decrease,  the 
quantity  demanded  increases."  This  was  incorrect.  She  then  tried  to  graph  a  demand 
curve  (77-78)  but  was  unsuccessful.  During  the  period  involving  the  nodes  79  to  80, 
she  hypothesized  that  as  price  increases,  the  demand  curve  shifts  down  and  to  the 
left.  This  was  incorrect.  The  subject  then  entered  the  coffee  market  suddenly  and 
without  any  apparent  reason  (81-82).  At  (83),  she  changed  labor  costs  from 

$4. 00/hour  to  $20. 00/hour  followed  by  three  more  incorrect  hypotheses  (84-88): 

•  As  labor  costs  increase,  the  quantity  supplied  decreases  and  shortage 
increases. 

•  Quantity  demanded  has  no  relation  to  labor  costs. 

•  Quantity  demanded  has  no  relation  to  the  price  of  resources. 

During  (89-90)  she  decided  to  change  the  population  from  10,000  to  50,000  and 
then  returned  the  labor  costs  to  $4. 00/hour.  Finally,  at  (91),  she  again  attempted  a 
hypothesis  that  as  population  Increases,  quantity  demanded  increases  and  quantity 
supplied  increases.  This,  too,  was  not  quite  right. 


A  major  difference  in  experimental  behavior  illustrated  here  seems  to  be  one  of 
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staying  with  a  problem  until  It  is  solved.  Subject  BW,  when  his  Initial  hypothesis 
turned  out  to  be  incorrect,  did  more  experimenting  to  understand  more  precisely  the 
nature  of  the  problem.  In  contrast,  subject  SS,  who  apparently  was  motivated  by  just 
getting  a  hypothesis  correct,  tried  different  hypotheses,  some  of  which  were  wild 
guesses  as  there  was  no  relation  between  the  stated  hypotheses  and  the  experiments 
actually  conducted. 


The  next  indicator  to  discriminate  between  more  and  less  effective  performance 
was  indicator  28:  Changing  only  a  limited  number  of  variables  per  experiment  where 
the  fewer  variables  changed,  the  better  the  subsequent  performance.  BW  and  CF 
(more  successful)  were  very  conscientious  in  changing  only  one  variable  at  a  time  per 
experiment.  Given  the  freedom  of  the  environment,  it  often  was  a  great  temptation 
to  make  changes  to  multiple  variables  concurrently,  however,  the  ensuing  results  are 
obscured  as  far  as  what  was  actually  responsible  for  the  state  of  current  market 
affairs.  Subjects  HT  and  OY  (less  successful)  often  fell  prey  to  this  temptation  of 
making  multiple  changes.  For  example,  while  investigating  the  market  for  large  cars 
and  asked  what  he  was  going  to  do,  OY  responded,  */  want  to  just  go  back  and 
c’  ange  some  stuff.  ’  He  then  proceeded  to  change  interest  rates  from  15%  to  0.7%, 
number  of  suppliers  ( i.e. ,  large  car  dealerships)  from  10  to  20,  consumer  preference 
(i.e.,  popularity  of  large  cars)  from  5  (medium)  to  10  (very  high),  and  then  back  to  5, 
per  capita  income  Trom  $20,000.00  to  $25,000.00,  and  then  interest  rates  from  6.7% 
to  9%.  This  was  all  done  at  one  time  without  collecting  any  data  in  between  the 
changes.  When  he  was  asked  about  what  he  would  predict  would  happen  as  a  result 
of  all  of  the  changes,  he  said,  "l  think  they'll  still  buy  the  cars,  because  the  income 
is  higher  now.  .  but  the  interest  rates  are  higher.  .  .but  since  they're  making  more 
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income ,  then  I  think  they  can  afford  it.*  OY’s  working  memory  capacity  has 
obviously  been  overloaded  at  this  point  and  he  falls  to  even  consider  the  effects  of  the 
number  of  suppliers,  consumer  preference,  or  any  of  the  potential  interactions.  Upon 
Inspecting  the  market  data,  he  sees  that,  in  fact,  there  is  an  overall  surplus  of  large 
cars.  This  is  viewed  as  confirmation  of  his  prediction,  but  it  obviously  is  confounded 
by  the  fact  that  he  had  raised  the  number  of  large  car  dealerships  in  Smithtown  as 
well  as  the  per  capita  income.  These  last  two  actions  actually  have  opposing  effects 
whereby  increasing  the  number  of  dealers  would  result  in  a  surplus  of  cars  while 
increasing  the  income  would  cause  a  shortage  of  cars. 


The  last  indicator  falling  under  the  Thinking  and  Planning  category  involves 
collecting  sufficient  amounts  or  data  before  making  a  hypothesis  of  any  of  the 
economic  concepts  (indicator  24).  Good  scientific  methodology  involves  generalizing 
a  concept  based  on  enough  examples  or  Instances  of  a  phenomenon  rather  than 
inadequate  data  which  may  include  elements  of  chance,  confounding  variables,  or 
other  things.  BW  (more  successful)  investigated  the  concept  of  "surplus"  and  its 
relationship  to  price,  quantity  demanded  and  quantity  supplied.  In  the  following 
protocol,  it  is  apparent  that  his  investigation  looked  at  the  concept  from  many 
angles,  collecting  more  than  enough  data  before  rendering  a  hypothesis.  He  had  Just 


had  the  computer  adjust  the  price  of  hamburger  buns  (raising  the  price), 

BW:  The  price  went  up  a  lot,  and  there's  a  big  surplus.  .  .  .  Well,  as  I 
found  out  before,  as  price  goes  up,  the  quantity  demanded  goes  down, 
quantity  supplied  goes  up.  So,  by  notv  the  quantity  demanded  has  gone 
below  the  quantity  supplied,  and  there’s  a  surplus.  So,  the  next  time 
around  I  think  the  price  should  go  back  down  ’cause  there's  a  lot  of 
hamburger  buns  around  here.  They'll  go  on  sale. 
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He  watched  the  price  slowly  converge  on  equilibrium,  interrupting  the  computer 
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adjustments  with  price  adjustments  himself  until  he  found  the  equilibrium  price 

where,  ’at  $1.55  it  came  out  right.  .  .  no  surplus  and  no  shortage.’  He  speculated, 
BW:  OK,  so  I  found  out  when  there’s  no  surplus  and  no  shortage  the 
price  won  7  change.  /  could  phrase  that  into  a  hypothesis  also.  When 
there's  a  surplus,  price  decreases,  and  when  there’s  a  shortage,  price 
increases.  ...  If  surplus  is  greater  than  zero,  then  the  price  decreases. 

When  asked  If  he  could  characterize  surplus  any  other  way,  he  responded, 

BW:  Well,  it's  just  quantity  supplied  minus  quantity  demanded.  I  can 
state  that.  I've  got  enough  examples!  There’s  a  surplus  when  the  quantity 
supplied  is  greater  than  the  quantity  demanded. 

He  then  used  the  Hypothesis  Menu  and  formalized  the  above  statement  Into  a 

successful  specification  of  "surplus.*  Immediately  afterwards,  he  used  the  same  data 

and  logic  to  characterize  "shortage." 


In  contrast.  HT  (less  successful)  was  content  to  make  predictions  and 
hypotheses  based  on  single  events  and  non-  replicated  experiments.  This  was  not  a 
good  strategy  for  this  subject  to  follow  since  her  data  management  skills  were  neither 
efficient  nor  consistent.  Moreover,  sometimes  she  forgot  or  misconstrued  what  the 
previous  data  were,  not  bothering  to  go  back  and  retrieve  the  omitted  data.  For 
Instance,  after  spending  a  long  time  in  the  final  session  trying  to  determine  the 
influence  of  population  changes  on  some  of  the  dependent  variables,  she  conducted  an 
experiment  which  involved  decreasing  the  population  of  Smlthtown  from  10,000  to 
5.000.  At  that  time,  she  was  investigating  the  donut  market,  and  the  experimenter 

asked  what  she  expected  to  see  as  the  result  of  this  population  decrease, 

HT:  So,  less  people  will  eat  f donuts J. 

Experimenter:  What  about  quantity  supplied  and  price? 

IIT:  When  population  decreases,  demand.  .  .  quantity  demanded 

decreases,  and  quantity  supplied  decreases.  .  .  price  increases. 
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The  market  actually  depicted  the  price  and  the  quantity  supplied  remaining  the 
same  while  the  only  change  was  in  the  quantity  demanded,  which  changed  as  a 
function  of  the  demand  curve  shift.  Next,  HT’s  actions  centered  around  price 
changes,  to  get  an  equilibrium  price  for  the  donut  market  In  the  smaller  sized  town. 
She  did  not  replicate  the  experiment  with  the  population  change,  and  later,  when 
attempting  to  articulate  a  hypothesis,  she  remembered  erroneous  results  and  showed 

little  understanding  of  cause  and  effect  among  the  variables: 

IIT:  OK.  so,  I  think  when  population  decreased,  the  price  decreased. 
That's  why  there  is  changes  between  quantity  supplied  and  quantity 
demanded. 

Experimenter:  What  was  the  first  thing  that  happened? 

IIT:  I  think  quantity  demanded  decreased.  .  .  and  when  quantity 
demanded  decreased,  price  decreased.  Quantity  supplied.  .  .  let's  see, 
population  decreased.  .  .  quantity  supplied  decreased. 

Data  Management  Discriminating  Indicators 

Our  more  successful  subjects,  BVV  and  CF,  generally  exhibited  very  good  data 
management  skills,  using  their  notebooks  efficiently  and  consistently.  Notebook 
entries  were  typically  made  following  variable  changes  and  variables  were  included  in 
their  notebooks  that  had  been  specified  beforehand  in  the  Planning  Menu.  In 
contrast,  the  less  successful  subjects  (HT  and  OY)  never  became  fully  automatic  in 
entering  data  to  their  notebooks.  They  continued  to  .forget  to  record  important 
information  throughout  the  three  sessions  and  had  to  rely  on  the  history  window  to 
re-  insert  forgotten  data.  They  also  excluded  variables  whose  values  were  changed  or 
that  were  listed  in  the  Planning  Menu.  In  addition,  they  continued  to  omit  baseline 
data.  This  latter  omission  was  a  major  problem  when  attempting  to  attribute  causes 


to  market  conditions. 
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Indicators  9  and  13  have  to  do  with  the  total  number  of  notebook  entries  and 
the  number  of  relevant  notebook  entries  made,  respectively.  In  terms  of  just  the 
total  number  of  notebook  entries,  the  more  entries,  the  better  the  performance.  In 
terms  of  the  type  of  notebook  entries,  the  more  ■relevant"  notebook  entries  made, 
overall,  the  better  the  performance,  where  "relevant*  variables  are  those  specified  In 
the  Planning  Menu  as  the  variables  the  subject  was  Interested  In  exploring  and 
collecting  data  on.  This  measure  indicates  whether  the  Individual  used  the  notebook 
efficiently  in  terms  of  recording  important  information. 

To  illustrate  the  contrast  in  types  of  data  recording  skills,  Figures  15  and  16 
show  examples  of  students  with  better  and  worse  recording  skills. 

In  Figure  15,  B\V  (more  successful)  entered  the  tea  market  and,  prior  to 
changing  any  variables,  decided  'to  see  what  the  initial  conditions  are.  '  He 
followed  the  observation  with  a  notebook  entry  of  the  baseline  data,  seen  In  nodes  1 
and  2.  At  (3).  he  increased  the  price  of  tea  from  $  1.83/box  to  $2. 50/box  'to  see  if 
there's  a  relation  between  price  and  quantity  demanded  and  quantity  supplied.' 
This  price  change  was  also  duly  recorded  in  the  notebook.  During  (4-5),  BW 
continued  to  Investigate  this  relationship  by  decreasing  the  price  two  more  times, 
following  each  change  with  a  notebook  entry.  He  then  graphed  a  demand  curve  (6), 
and  successfully  superimposed  a  supply  curve,  saving  the  graph  for  future  reference. 
This  systematic  performance  led  to  the  correct  induction  of  the  laws  of  demand  and 
supply  (7-9): 

•  As  price  Increases,  the  quantity  demanded  decreases. 

•  When  price  increases,  quantity  supplied  Increases. 
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Our  less  successful  subject,  HT,  demonstrated  Inefficient  data  recording  skills  (see 
Figure  16).  Data  were  rather  haphazardly  entered  Into  the  notebook  and  the  subject 
did  not  systematically  record  variable  changes.  Figure  16  Illustrates  some  of  the 
multiple  variable  changes  and  subsequent  failure  to  record  sufficient  data  Into  the  on¬ 
line  notebook.  It  shows  that  the  subject  started  out  in  the  coffee  market  and 
changed  the  weather  conditions  from  a  mediocre  value  of  5  to  a  slightly  less  pleasant 
value  of  3  to  see  If  that  would  affect  the  demand  for  coffee,  seen  In  nodes  1  and  2. 
she  predicted  that  if  the  weather  decreased  (became  worse),  then  the  price  of  coffee 
would  increase.  However,  since  she  had  failed  to  record  any  baseline  data  for  the 
coffee  market,  she  was  unable  to  make  the  appropriate  comparlson(s). 

She  then  decided  to  ignore  the  weather  influences,  and  at  (3),  changed  the 
population  from  10,000  to  4.000  persons.  She  predicted  that  ir  the  population 
decreased,  then  a  surplus  would  result.  But,  as  with  the  above  situation,  she  had 
failed  to  record  the  baseline  data  for  when  the  population  was  10,000,  so  she 
reinserted  the  necessary  data  from  the  past  experiments.  Next,  HT  tried  to  graph 
some  data  at  node  4:  price  by  quantity  demanded  and  price  by  population,  but  in 
each  case,  there  was  only  one  data  point  per  variable,  thus  no  line  could  be  drawn. 
She  then  switched  to  the  market  for  gasoline  (5-6)  and  raised  the  price  from 
$1.50  gallon  to  $4.00  gallon.  Since  she  again  had  failed  to  record  the  data  from  the 
market  when  gas  was  $l. 50/gallon,  she  had  to  reinsert  this  information  Into  the 
notebook.  With  this  additional  data,  she  tried  to  graph  It  again  and  at  (7),  she 
successfully  plotted  a  demand  curve.  At  node  8,  she  entered  the  market  for  large 
cars.  Her  first  action  there  was  not  to  record  the  baseline  data,  but  to  change  Income 


from  $20,000  to  $.30,000.  She  predicted  that  'if  (he  income  increases,  people  will 
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have  more  money  to  buy  large  ears,  so  the  price  of  the  cars  will  increase,  and  the 
quantity  demanded  will  increase,  the  quantity  supplied  will  decrease,  and  there  will 
be  a  shortage.  *  When  she  saw  the  data  resulting  from  her  Increase  to  the  per  capita 
Income,  however,  price  and  quantity  supplied  actually  stayed  the  same  while  the 
quantity  demanded  did,  in  fact,  go  up.  Since  she  did  not  have  the  baseline  data,  she 
was  unable  to  tell  if  her  prediction  was  confirmed  or  not  since  "Increase"  and 
"decrease"  have  to  be  interpreted  relative  to  some  other  data.  Finally,  at  node  9,  she 
changed  interest  rates  from  15°?  to  10°?,  predicting  that  if  interest  rates  decrease, 
then  price  will  decrease  and  quantity  demanded  will  increase  and  quantity  supplied 
will  decrease.  She  stated  that  she  believed  her  prediction  was  confirmed,  but  later 
realized  that  price  had  stayed  the  same.  Her  last  action  in  this  segment  shows  how 
she  tried  to  graph  price  against  interest  rates,  but  she  did  not  have  enough  data. 

Indicator  16  Is  the  last  major  discriminating  Index  in  the  Data  Management 
category  and  deals  with  the  number  of  specific  predictions  made  by  an  Individual  in 
relation  to  the  number  of  general  hypotheses.  In  this  case,  the  higher  the  ratio,  the 
better  the  overall  performance.  Studies  have  investigated  individual  differences 
between  novices  and  experts  in  solving  physics  problems  where  an  important 
distinction  between  the  two  groups  ts  that  the  experts  used  a  "working-forward" 
strategy  and  the  novices  used  a  "working-backward"  strategy  (Simon  £  Simon,  1978; 
Larkin,  McDermott,  Simon  £  Simon,  1980;  Chi,  Glaser  £  Rees,  1982).  They  suggest 
that  the  novices  may  be  more  data-driven,  while  experts  may  be  schema-driven  In  the 
sense  that  their  representation  of  a  problem  accesses  a  repertoire  of  solution  methods. 
Thus,  the  novices-  limitations  are  derived  from  their  inability  to  infer  further 
knowledge  from  the  literal  cues  In  the  problem  statement.  In  contrast,  these 
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inferences  necessarily  are  generated  In  the  context  of  the  relevant  knowledge 
structures  that  experts  possess.  Predictions,  in  Smithtown,  serve  as  a  foundation  or 
stepping  stones  to  more  general,  abstract  principles  and  laws  of  economics.  Our  more 
successful  subjects  seemed  to  be  able  to  w'ork  forward  toward  a  goal  (i.e,  they  knew 
where  they  were  going)  in  contrast  to  our  less  successful  subjects  who  often  got  stuck 
at  the  more  superficial  or  data  level  of  Investigation.  To  Illustrate,  CF  (more 
successful)  was  interested  in  looking  at  the  relationship  between  the  coffee  and  tea 
markets,  * because  they  are  similar.  ”  First  she  Increased  the  price  of  coffee  and 
collected  data  on  the  resulting  decreased  quantity  demanded  and  Increased  quantity 
supplied.  Next,  she  chose  the  framework:  Change  good,  keep  independent  variables 
the  same,  changing  to  the  tea  market.  Since  the  price  of  coffee  had  been  increased, 
more  people  had  shifted  to  drinking  tea  so  the  tea  market  came  up  with  an  initial 
shortage  confirming  her  Initial  prediction  that,  "If  the  price  of  coffee  increases,  then 
the  quantity  demanded  of  tea  will  increase.”  She  remained  in  these  two  markets 
and  went  on  to  investigate  the  concept  of  a  new  equilibrium  point  and  demand  shifts. 
She  continued  making  predictions,  observing  the  data,  then  proceeded  on  to 
successfully  articulate  the  rules  underlying  the  higher  level  concepts.  The  less 
successful  subjects  skipped  among  markets,  failing  to  make  sufficient  predictions  in 
order  to  test  out  developing  hypotheses  that  would  have  led  to  more  economic 
concepts  being  ultimately  discovered. 
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Activity/Exploration  Discriminating  Indicator 

The  last  category  to  be  discussed  has  to  do  with  the  number  of  times  the 

subject  had  the  computer  make  a  price  adjustment  toward  equilibrium  (Indicator  6). 

From  the  beginning  sessions,  our  better  subjects  immediately  grasped  the  utility  of 

letting  the  computer  make  price  adjustments  while  both  the  pattern  of  the  changes  It 

made  and  the  effects  on  the  market  condition  were  observed.  Subject  BW  (more 

successful)  said,  after  his  first  computer  change  of  the  price,  and  when  asked  If  the 

change  was  In  accord  with  hts  expectations, 

* Well ,  yes,  I  thought  so.  Quantity  demanded  for  hamburger  buns  was 
very  high,  and  there  were  very  few  hamburger  buns,  so,  it  seems  that 
suppliers  would  be  able  to  get  more  for  them.  So,  the  price  went  up  and 
there's  still  a  shortage  of  hamburger  buns.  If  I  let  the  computer  adjust 
the  price  again,  the  price  will  probably  go  up  again.  * 

He  demonstrated  an  understanding  that  when  the  computer  changed  the  price,  the 

opportunity  for  observing  systematic  changes  and  relationships  was  provided. 

Although  he  did  not  have  enough  data  to  conceptualize  "equilibrium  point,"  he  had 

started  to  understand  that  when  shortages  exist,  prices  go  up.  Our  less  successful 

subjects  tried  to  use  the  option:  Computer  Adjust  Price,  but  they  did  not  really 

grasp  its  purpose.  It  was  revealed  In  the  second  session  that  subject  HT  (less 

successful)  had  no  Idea  what  was  going  on: 

HT:  Just  now  /  had  the  wComputer  Adjust  Price. f 

Experimenter:  Yes.  Do  you  understand  what's  going  on? 

HT:  No,  I  have  no  idea  about  that. 

Experimenter:  H7 lat  happened  when  you  chose  that?  How  did  it  adjust 
the  price? 

HT:  So,  the  price  now  increased  from  $ 1.70  to  $1  90,  and  the  quantity 
demanded  decreased,  decreased  just  a  little  bit.  The  quantity  supplied 
increased  a  little  bit  too.  No  surplus,  and  the  shortage  is  6.  Population 
is  the  same.  So  the  price  increased.  .  .  . 
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Experimenter:  .  .  .What  would  happen  if  you  let  the  computer  adjust 
the  price  again?  Would  it  go  up,  down,  or  stay  the  same ? 

HT:  /  don  7  know  much  about  the  ’Computer  Adjust  Price.  ’  It  can 
increase  or  decrease,  reduce  the  price.  .  .  . 

The  subject  continued  to  have  difficulty  with  this  throughout  the  first  two  sessions 
(or  4/5  of  the  entire  time  with  Smithtown),  not  realizing  the  benefits  of  observing  the 
computer  make  price  adjustments  toward  equilibrium. 

Performance  differences  between  our  two  groups  were  probably  a  function  of 
the  interaction  of  all  of  the  aforementioned  performance  indicators.  The  behaviors 
that  differentiated  the  subjects  consisted  of:  generalizing  concepts  across  markets 
where  the  generalizations  were  a  result  of  well  thought  out  and  executed  plans, 
having  sufficient  data  collected  prior  to  the  generalization,  engaging  in  more  complex 
experiments  within  a  given  market  and  not  moving  randomly  among  markets,  (i.e., 
staying  in  an  experiment  long  enough  to  extract  valuable  Information),  changing 
variables  in  a  parsimonious  and  systematic  fashion,  recording  Important  data  in  the 
notebook  from  different  experiments,  and  generating  and  testing  predictions  that 
could  lead  to  the  Induction  of  economic  principles  and  laws. 
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General  Discussion 

The  comparisons  between  the  economics  classroom  and  the  experimental  group 
In  terms  of  their  pre-  and  post-test  results  suggest  that  learning  In  the  exploratory 
world  is  at  least  as  effective  as  traditional  classroom  learning.  In  fact,  when  learning 
time  is  compared,  tne  students  Interacting  with  Smlthtown  spent  less  than  half  the 
amount  of  time  formally  learning  economics  compared  to  the  length  of  time  spent  by 
the  students  In  the  economics  classroom.  It  is  possible  that  a  group  receiving 
classroom  instruction  and  the  Intelligent  discovery  world  could  do  even  better.  This 
remains  an  empirical  question. 


Our  second,  more  compelling  concern,  was  with  the  experimental  group.  In 
particular,  we  wanted  to  know  how  individuals  learn  or  do  not  learn  In  this  type  of 
environment,  and  on  what  measures  the  better  and  poorer  learners  differ.  The 
contrasting  pairs  of  subjects  we  Illustrated  differed  mostly  on  measures  relating  to 
thinking  and  planning  skills  (be.,  effective  experimental  behaviors)  with  fewer  but 
significant  differences  In  terms  of  data  management  skills.  The  behaviors  that 

differentiated  the  subjects  were  the  following: 

1.  Generalizing  concepts  across  markets.  The  better  subjects  would  try  out 
economic  concepts  in  different  markets  to  see  if  they  were  supported 
while  the  less  effective  subjects  would  not  bother  to  extend  an  experiment 
across  markets. 

2.  Engaging  In  more  complex  experiments  within  a  given  market  and  not 
moving  randomly  among  markets.  Typically,  the  better  subjects  had 
many  more  actions  within  a  given  experiment  and  Investigated  fewer 
markets  overall  compared  to  the  less  effective  subjects. 

3.  Changing  only  one  variable  at  a  time  and  holding  all  others  constant. 

The  biggest  problem  for  the  poorer  subjects  was  that  they  persisted  in 
changing  multiple  variables  simultaneously.  The  better  subjects  changed 
fewer  variables  at  a  time,  typically  Just  single  variables. 
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4.  Basing  generalizations  on  sufficient  data.  We  set  as  our  criteria  having  at 
least  three  related  rows  of  notebook  entries  before  using  the  hypothesis 
menu.  The  more  successful  subjects  did  not  attempt  to  make  general 
hypotheses  prior  to  collecting  enough  data  on  a  given  concept  while  the 
less  successful  subjects  were  content  to  make  careless  and  Impulsive 
generalizations  based  on  inadequate  data. 

5.  Conducting  an  experiment  based  on  a  planned  manipulation  or  set  of 
manipulations.  The  planning  and  inferencing  abilities  of  the  better 
subjects  allowed  them  to  set  up  an  experiment  and  execute  It  thoroughly 
whereas  advance  (i.e.,  higher-level)  planning  by  the  less  successful  subjects 
was  rarely  evidenced  throughout  the  experimental  sessions. 

6.  Generating  and  testing  experimental  predictions.  The  better  subjects 
tended  to  be  more  hypothesis  or  rule-driven  (working  forward  towards  a 
goal)  while  the  less  efficient  subjects  were  more  data-driven  in 
experimentation.  When  evidence  does  not  confirm  a  hypothesis,  further 
experimentation  is  required  to  modify  the  hypothesis.  The  better  subjects 
generally  recognized  and  implemented  this  approach,  while  others 
engaged  In  less  systematic  activities. 

7.  Entering  data  into  the  on-line  notebook.  Better  subjects  had  more 
notebook  entries  overall  compared  to  the  less  effective  subjects.  In 
addition,  those  entries  tended  to  be  more  consistent  with,  and  relevant  to, 
the  focus  of  their  Investigation. 

8.  Using  the  computer  to  make  price  adjustments  of  a  good  towards 
equilibrium. 


Demographic  information  was  obtained  along  with  the  pre-test  battery  from  all 
subjects,  and  two  questions  asked:  (1)  what  science  courses  the  subject  had  taken 
since  high  school,  and  (2)  what  their  major  was.  Subject  BW  (more  effective)  had 
taken  Just  two  science  courses  (physics  I  and  II)  and  he  was  a  sophomore,  majoring  in 
math.  Subject  CF  (more  effective)  had  taken  three  science  courses  (physics  I,  II,  and 
III)  and  was  also  a  sophomore,  majoring  in  electrical  engineering.  In  our  less  effective 
group.  Subject  IIT  had  five  science  courses  (physics,  two  semesters  of  Calculus, 
Fortran  and  chemistry)  and  she  was  a  freshman,  majoring  In  pharmacy  while  subject 
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OY  had  three  science  courses  (chemistry,  biology  and  physics),  was  a  sophomore, 
majoring  in  electrical  engineering.  These  pairs  could  have  differed  in  their  scientific, 
investigative  behaviors  as  a  function  of  past  academic  courses  or  variables  relating  to 
learning  style  differences.  Thus,  according  to  a  hypothesis  that  different  backgrounds 
were  a  cause  of  the  observed  differences,  we  would  have  expected  the  less  scientific 
subjects  to  have  taken  fewer  science  courses.  This  was  not  the  case.  In  fact,  the  less 
successful  group  had  an  average  of  4  prior  science  courses  while  our  more  successful 
group  only  had  an  average  of  2.5  science  courses  since  high  school.  In  addition,  each 
of  the  subjects  was  a  science  major.  Of  the  original  ten  subjects  in  our  experimental 
group,  this  same  pattern  was  found.  Dividing  the  subjects  into  two  groups  of  five 
each  based  on  their  gain  score,  the  two  groups  had  the  same  number  or  declared 
science  majors  in  each  (i.e.,  3  per  group).  However,  the  “less  successful"  group  had 
taken  considerably  more  science  courses  since  high  school  (total  =  27)  compared  to 
the  "more  successful"  group  (total  =  8).  Thus,  the  idea  of  differential  exposure  to 
science  training  seems  not  to  be  a  major  factor  in  determining  who  will  demonstrate 
better  scientific  behaviors. 

Although  this  study  focused  on  contrasting  subjects  in  a  descriptive  and 
exploratory  sense,  the  question  arises  if  the  findings  generalize  to  the  population  at 
large.  As  part  of  the  Learning  Abilities  Measurement  Program  (LAMP)  at  the  Air 
Force  Human  Resources  Laboratory,  the  first  author  is  currently  testing  a  large 
group  of  subjects  (i.e.,  basic  recruits  at  Lackland  Air  Force  Base,  Texas)  with  a 
modified  version  of  the  system  which  Includes  44  performance  indicators  that  are 
automatically  tallied  in  real  time  and  summarized  by  the  computer  at  the  end  of  a 
three  and  a  half  hour  session.  I'sing  a  measure  of  general  Intelligence  as  the 
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dependent  variable  (i.e,  the  AFQT:  a  composite  score  derived  from  the  Armed 
Services  Vocational  Aptitude  Battery),  listed  below  are  the  results  of  a  correlational 
study  of  the  indicators  with  AFQT  score  that  map  onto  the  above  descriptions  of 

major  individual  differences  (N=  527): 

1 .  Generalizing  concepts  across  related  or  unrelated  markets  did  not 
correlate  with  AFQT  scores  in  the  larger  study.  Reviewing  the  code,  we 
found  that  our  conditions  for  these  two  indicators  were  much  too 
stringent. 

2.  Engaging  in  more  complex  experiments  within  a  given  market  was  tallied 
by  the  average  number  of  actions  per  experiment.  This  indicator  had  a 
significant  correlation  with  .AFQT  score:  r=  .17;  p<  .001,  therefore  the 
more  connected  actions  taken  in  an  experiment  was  associated  with  a 
higher  .AFQT  score.  Related  to  the  nature  of  the  experimentation,  we  also 
tallied  the  total  number  of  markets  investigated.  Three  regression 
analyses  were  run  on  the  data:  forward,  backward  and  stepwise,  with 
AFQT  score  as  the  dependent  variable.  In  all  solutions,  "Number  of 
Markets"  was  one  of  the  five  most  predictive  variables  with  an  inverse 
relationship  to  .AFQT.  That  is,  the  fewer  markets  investigated,  the  more 
predictive  of  higher  AFQT  score. 

3.  The  average  number  of  independent  variables  changed  at  one  time  (i.e., 
per  experiment)  had  a  significant  negative  correlation  to  AFQT  score  (r= 

-.23:  p<  .001)  implying  that  the  fewer  variables  changed  at  a  time,  the 
better  the  performance. 

4.  Making  hypotheses  based  on  sufficient  data  was  estimated  by  the 
indicator  computing  if  the  subject  had  at  least  three  rows  of  related 
notebook  entries  before  using  the  Hypothesis  Menu.  This  correlated  with 
AFQT  score  in  our  larger  sample:  r  — .30;  p<  .001,  thus  the  better 
subjects  relied  on  more  data  before  formulating  general  principles  and 
laws. 

5.  When  a  subject  specifies  hts  her  intentions  for  an  experiment  via  a 
contrived  manipulation  on  a  variable  or  set  of  variables  in  the  Planning 
Menu  and  actually  conducts  the  experiment  with  those  variables,  this 
indicator  is  incremented.  There  was  a  significant  correlation  between 
planned  performance  and  AFQT  score:  r  .16;  p<  .001  Implying  that 
the  more  intelligent  persons  tended  to  engage  in  more  higher  level, 
advanced  planning  of  an  experiment. 

6.  Making  and  testing  predictions  of  experimental  outcomes  and  then 
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observing  the  results  for  confirmation  or  negation  of  the  prediction  Is 
effective  Interrogative  behavior  and  tallied  by  this  Indicator.  In  this 
larger  study,  the  overall  number  of  predictions  that  a  subject  made  was 
correlated  with  AFQT  score:  r=  .18;  p<  .001. 


7.  The  quantity  and  quality  of  on-line  notebook  entries  were  two  significant 
indicators  discriminating  among  subjects.  First,  the  total  number  of 
entries  in  the  notebook  was  significantly  correlated  with  AFQT  score:  r= 
.26;  p<  .001,  therefore  the  higher  AFQT  scores  were  associated  with 
more  notebook  entries  overall.  Variables  entered  into  the  notebook  that 
had  been  specified  in  the  Planning  Menu  was  the  second  Indicator, 
correlating  with  AFQT  score:  r=  .30;  p<  .001.  This  implies  that  higher 
.AFQT  scores  are  associated  with  consistent  behaviors;  that  is,  formulating 
a  planned  set  of  variables  to  investigate  and  reliably  entering  those 
variables  into  the  notebook. 


8.  The  Indicator  tallying  the  number  of  times  the  student  had  the  computer 
make  a  price  adjustment  was  not  correlated  with  AFQT  score  in  our 
larger  study  confirming  our  suspicions  that  it  is  more  of  a  cognitive  style 
preference  than  a  learning  skill  discriminator. 


Other  indicators  from  the  larger  study  that  significantly  correlated  with  AFQT 
score  included  the  following:  (a)  Total  number  of  actions  taken  in  the  experimental 
sessions.  This  correlated  with  .AFQT  score:  r=  .26;  p<  .001  implying  that  the  more 
intelligent  persons  were  more  active,  overall,  than  the  less  intelligent  persons.  This 
must  be  viewed  in  light  of  the  other  indicators  relating  to  the  quality  of  performance, 
however,  as  it  Is  not  a  matter  of  simply  being  ■busy*  in  the  environment,  but  active 
in  a  connected,  directed,  systematic  sense,  (b)  Total  number  of  economic  concepts 
learned.  The  r=  .18;  p<  .001  therefore  the  higher  AFQT  scores  were  associated 
with  learning  more  concepts  in  the  3.5  hour  session.  Finally,  (c)  The  number  of 
experimental  frameworks  utilized  by  the  subjects  correlated  with  .AFQT  score:  r= 
.27:  p<  .001.  so  the  experimental  frameworks  were  employed  more  by  the  successful 
individuals  as  a  planning  procedure  thai.  the  less  successful  persons. 
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Thus,  the  larger  study  seems  to  corroborate,  to  some  extent,  the  findings  from 
the  descriptive  analyses,  extending  and  more  precisely  delineating  Individual 
differences  in  learning  from  this  type  of  environment. 

A  limitation  of  the  present  study  was  the  collapsing  of  data  across  sessions  for 
this  initial  Investigation.  This  can  result  in  a  loss  of  Information  that  is  valuable  for 
looking  at  individual  differences  and  changes  in  knowledge  and  skills  over  time. 
Another  limitation  was  that  the  use  of  difference  scores  on  the  economic  tests  as  the 
measure  of  success  was  not  ideal.  That  is  because  our  primary  focus  for  Smith  town 
was  on  the  learning  of  good  inquiry  skills,  and  only  secondarily  on  the  acquisition  of 
economic  knowledge.  The  ideal  criterion  (and  data  we  plan  to  collect)  should  be  the 
transfer  of  skills  across  domains;  i.e.,  how  well  students  perform  In  a  new 
environment  with  a  similar  structure/architecture  but  which  differs  in  content  from 
Smithtown.  Currently,  there  are  several  other  systems  being  developed  that  fit  these 
criteria,  and  further  studies  are  planned  which  will  Investigate  transfer  of  learning  of 
these  inquiry  skills  to  new  domains. 

In  general,  it  appears  that  in  the  rather  complex  task  involved  in  this  study, 
many  of  the  behaviors  that  differentiated  successful  and  less  successful  subjects  are 
similar  to  those  Identified  in  previous  studies  with  both  laboratory  and  more  realistic 
tasks.  Individual  differences  In  performance  in  our  exploratory  environment  Involved 
the  following  dimensions;  generalization,  goal  setting  and  planning,  more  or  less 
structured  search,  specific  performance  heuristics,  and  memory  management.  Better 
subjects  tended  to  think  in  terms  of  generalizing  their  hypotheses  and  explorations 
beyond  the  specific  experiment  or  market  they  were  working  on.  They  conceived  of 
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a  lawful  regularity  as  a  general  principle  and  as  a  description  of  a  class  of  events 
rather  than  a  local  description.  Better  reasoners  were  more  sensitive  to  the  existence 
to  the  existence  of  deeper  explanatory  principles  In  addition  to  local  data 
descriptions;  they  appeared  to  realize  that  discovery  was  not  only  a  function  of  data, 
but  that  they  needed  to  generate  some  rule  that  could  provide  them  with  a  goal  for 
their  actions.  In  this  sense  they  tended  to  be  more  rule  or  hypothesis-driven  than  the 
less  successful  subjects. 

Better  reasoners  also  engaged  in  more  connected  actions--  more  structured 
search.  They  conceived  of  a  particular  market  as  a  rich  environment  in  which  many 
actions  needed  to  be  taken  in  order  to  develop  a  structured  understanding; 
disconnected  probes  did  not  assist  them  In  their  attempt  at  understanding.  Less 
successful  subjects,  on  the  other  hand,  moved  more  frequently  between  markets. 
Their  behavior  was  more  fragmented  and  displayed  a  breadth  of  exploration.  In 
contrast  to  more  depth-like  search.  In  their  attempt  to  establish  meaning  In  a 
particular  context 

Planning  b  diaxiors  differentiated  individuals  where  successful  subjects  planned 
their  manipulations  and  experiments  Given  the  opportunity,  they  would  structure  a 
plan  and  then  carry  it  out  with  specific  information.  The  immediacy  of  carrying  out 
some  action  was  more  salient  to  the  less  successful  subjects,  comparable  to  Jumping 
to  equations  solving  in  physics  problems. 

The  successful  individuals  In  our  study  employed  more  powerful  heuristics 
compared  to  the  less  successful  Individuals.  They  manipulated  fewer  variables, 
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holding  variables  constant  while  one  variable  was  systematically  explored.  Less 
successful  subjects  did  not  seem  to  realize  the'  power  of  this  heuristic  and  for  them  it 
was  a  less  salient  activity.  Successful  subjects  took  their  time  to  generate  sufficient 
evidence  before  coming  to  a  conclusion  while  the  less  successful  subjects  were  more 
impulsive  and  attempted  to  induce  generalizations  based  on  Inadequate  information. 

The  necessity  to  manage  memory  was  evident  in  the  performance  of  the  better 
subjects.  They  realized  that  they  needed  to  store  and  display  the  Information  they 
had  collected.  Their  data  management  performance  was  goal-  driven  in  the  sense 
that  the  data  collected  were  relevant  to  the  current  focus  of  their  investigation.  This 
contrasts  with  the  poorer  subjects'  data  management  behaviors  which  were  mostly 
inconsistent  and  often  unrelated  to  an  overall  goal  In  their  experimentation. 

In  regard  to  inductive  problem  solving,  as  Greeno  and  Simon  state  and  as  Klahr 
and  Dunbar  describe  the  interplay  between  rules  and  instances,  the  best  learning 
strategy  is  a  combination  of  bottom-up  and  top-down  processing.  In  our  subjects, 
this  seemed  to  be  the  case:  the  better  subjects  would  pr°dlct  variable  relationships 
and  then  test  those  hypotheses  out,  concurrently  exploring  and  collecting  data  which 
led  to  further  generalizations.  Our  less  effective  subjects  seemed  to  be  limited  to  a 
more  data-driven  (or  bottom-up)  approach,  often  falling  short  of  grasping  the  larger 
picture.  This  is  in  accord  with  findings  investigating  novice  -  expert  differences  In 
problem  solving  (e.g.,  Larkin,  McDermott,  Simon  and  Simon,  1980). 

Furthermore,  the  importanee  or  higher  level  planning  in  this  inductive 
discovery  environment  is  In  agreement  with  studies  of  individual  differences  in 
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reasoning  tasks  (e.g.,  Sternberg,  1985).  Better  subjects  consistently  planned  an 
experiment  and  then  executed  it  to  completion,  according- to  plan,  In  sharp  contrast 
to  the  more  haphazard,  less  planful  approach  applied  by  less  successful  subjects  In 
their  experimental  methodologies. 

In  conclusion,  we  have  described  an  initial  study  of  Individual  differences  In 
learning  from  an  exploratory  environment  where  students  had  the  opportunity  to 
engage  in  active,  discovery  learning  of  economic  concepts  by  manipulating  variables 
in  a  hypothetical  town  and  observing  the  repercussions.  Overall,  the  system  worked 
as  we  had  hoped:  Tutoring  on  the  scientific  inquiry  skills  resulted  In  learning  the 
domain  knowledge  as  evidenced  by  performance  on  the  post-test  battery. 

We  have  begun  to  delineate  skills  and  behaviors  which  are  Important  to 
scientific  Inference  and  discovery  learning.  Although  there  Is  currently  not  very 
much  research  being  conducted  In  this  area,  the  behaviors  we  have  Identified  In  this 
chapter  fit  with  findings  from  related  research  (e.g.,  Klahr  £  Dunbar,  1987;  Langley 
et  al.,  1987).  In  addition,  these  specific  behaviors  relate  to  Individual  differences 
found  in  studies  on  problem  solving  and  concept  formation.  FTom  an  Instructional 
perspective,  the  behaviors  we  have  Identified  can  serve  as  a  focal  point  for  relevant 
intervention  studies  Related  and  complementary  projects  that  are  planned  for  the 
Immediate  future  Include  (!,.  Schauble  A'  K.  Raghavan,  personal  communication. 
December  15,  1987):  (a)  extending  the  analysis  of  Inference  and  discovery  behavior 

across  content  domains,  (b)  studying  the  Influence  of  preconceptions  and  qualitative 
understanding  on  discovery  behavior,  (c)  Identifying  Intrasubject  variability,  (d) 
coaching  on  discovery  behavior,  (e)  Improving  discovery  behavior  through  training 
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and  practice,  and  (f)  implementing  a  generalized  'discovery  shell"  to  make  the 
discovery  environment  portable  across  topics.  The  work  In  the  above  areas  should 
yield  useful  educational  tools  and  insight  that  can  be  coordinated  with  science 


education. 
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Student  procedure  graph  of  a  more  successful  subject 
where  horizontal  movement  of  the  graph  indicates 
market  investigation  prior  to  the  second  hypothesis 
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Student  procedure  graph  of  a  less  successful  subject  where 
vertical  movement  of  the  graph  indicates  a  lack  of  experimentation 
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Student  procedure  graph  of  a  more  successful 
subject  showing  good  data  recording  skills 
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IX, _ Efficient  Tool  Usage 

13.  Number  of  "relevant*  notebook  entries  divided  by 
total  number  of  notebook  entries  where  'relevant ' 
refers  to  those  variables  specified  m  the  Planning 
Menu . 
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Number  of  times  the  table  package  was  used 
’correctly"  divided  by  the  total  number  of  times  the 
table  was  used,  where  ’correctly’  means  less  than  6 
variables  tabulated,  and  sorting  was  done  on 
variables  with  differing  values. 

Number  of  times  the  graph  package  was  used 
’correctly’  divided  by  the  total  number  of  times  the 
graph  was  used,  where  ’correctly’  means  plotting 
relevant  variables,  saving  graphs,  and  superimposing 
graphs  with  a  shared  axis. 


Number  of  specific  predictions  made  divided  by  the 
number  of  general  hypotheses  made.  The  larger  this 
ratio,  the  more  data-driven  the  inquiry. 

Number  o.f  correct  hypotheses  divided  by  the  total 
number  of  hypotheses  made. 


Number  of  notebook  entries  of  Planning  Menu  items. 
Number  of  times  notebook  entries  of  Planning  Menu 
items  were  made  divided  by  the  number  of  planning 
opportunities  the  subject-had. 

Number  of  times  variables  were  changed  that  had  been 
specified  beforehand  in  the  Planning  Menu. 


NUmber  of  times  an  experiment  was  replicated. 
Number  of  times  a  concept  was  generalized  across 
unrelated  goods. 

Number  of  times  a  concept  was  generalized  across 
related  goods . 
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24.  Number  of  times  the  student  had  sufficient  data  for  a 
generalization  (i.e.,  at  least  3  data  points  in  the 
notebook  before  using  the  Hypothesis  Menu). 


25.  Number  of  times  a  change  to  an  independent  variable 
was  sufficiently  large  enough  (i.e.,  greater  than  10% 
of  the  possible  range) . 

26.  Number  of  times  one  of  the  experimental  frames  was 
selected  (i.e.,  chose  ‘same  good,  change  variable,” 
‘change  good,  same  variables'  or  ‘change  good,  change 
variable  * )  . 

27.  Number  of  times  the  Prediction  Menu  was  used  to 
specify  a  particular  outcome  to  an  event. 

28.  Number  of  variables  changed  per  experiment.  (In  the 
initial  sessions,  this  should  be  a  low  number  for 
"effectiveness,*  while  in  the  later  sessions,  this 
should  be  a  higher  number  as  the  domain  knowledge 
increases  and  the  student  can  deal  with 
interrelationships  among  variables.) 

29.  Average  number  of  actions  per  experiment.  This 
should  be  an  increasing  function  over  sessions. 

30.  Number  of  economic  concepts  learned  per  session. 
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